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self-incompatible ornamental plants in the Brassicaceae. Sakamoto K, Kusaba M, Nishio T; 
Mol Gen Genet 1998;258:397-403. 



5 603. (sdh cyt) Succinate dehydrogenase cytochrome b subunit signatures 

Succinate dehydrogenase (SDH) is a membrane-bound complex of two main components: a 
membrane-extrinsic component composed of an FAD-binding flavoprotein and an iron-sulfur 
protein, and a hydrophobic component composed of a cytochrome B and a membrane anchor 
protein. The cytochrome b component is a mono heme transmembrane protein [1.2,3] 

1 0 belonging to a family that groups: - Cytochrome b-556 from bacterial SDH (gene sdhC). - 
Cytochrome b560 from the mammalian mitochondrial SDH complex. - Cytochrome b560 
subunit encoded in the mitochondrial genome of some algae and in the plant Marchantia 
polymorpha. - Cytochrome b from yeast mitochondrial SDH complex (gene SDH3 or CYB3). 
- Protein cyt-1 from Caenorhabditis. These cytochromes are proteins of about 130 residues 

1 5 that comprise threetransmembrane regions. There are two conserved histidines which may 
beinvolved in binding the heme group. Two signature patterns have been developed that 
include these histidine residues. 

Consensus pattern: R-P-ff4VMT}[I JVMT SEP ID NO: I rj_ x (3)4M\^ f LIYM SEP ID 
NiMij-x(6)-f-L^^ [ST] [H could 

2 0 be a heme ligand] 

Consensus pattern: H-x(3HGA]-{«¥M^ 

fWVMH[L]YMI [H 
could be a heme ligand] 

[ 1] Yu L., Wei Y.-Y., Usui S., Yu C.-A. J. Biol. Chem. 267:24508-24515(1992).[ 2] 
25 Abraham P.R., Mulder A., Van't Riet J., Raue H.A. Mol. Gen. Genet. 242:708-716(1994).[ 3] 
Leblanc C, Boyen C. ; Richard O.. Bonnard G., Grienenberger J.M., Kloareg B. J. Mol. Biol. 
250:484-495(1995). 



3 0 604. Seel family 

[1] The Seel family: a novel family of proteins involved in synaptic transmission and 
general secretion. Halachmi N, Lev Z; J Neurochem 1996;66:889-897. 
Number of members: 40 

i 
i 
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605. Protein secE/sec61 -gamma signature 

In bacteria, the secE protein plays a role in protein export; it is one of the components - with 
5 secY and sec A - of the preprotein translocase. In eukaryotes, the evolutionary related protein 
sec61 -gamma playsa role in protein translocation through the endoplasmic reticulum; it is 
part of a trimeric complex that also consist of sec61 -alpha and beta [1]. Both secE and sec61- 
gamma are small proteins of about 60 to 90 amino acids that contain a single transmembrane 
region at their C-terminal extremity (Escherichia colisecE is an exception, in that it possess 

10 an extra N-terminal segment of 60residues that contains two additional transmembrane 

domains). The sequence of sec E/sec61 -gamma is not extremely well conserved, however it is 
possible to derive a signature pattern centered on a conserved proline located 10 residues 
before the beginning of the transmembrane domain. 
Consensus pattern: ttPA'ff¥j[!JYMI 

15 Cl.NO:55^ 

[SEQ]-x(7)4LW4^ 

[ 1] Hartmann E., Sommer T., Prehn S., Goerlich D., Jentsch S., Rapoport T.A. Nature 
367:654-657(1994). 

20 

606. 1 1-S plant seed storage proteins signature 

Plant seed storage proteins, whose principal function appears to be the major nitrogen source 
for the developing plant, can be classified, on the basis of their structure, into different 

2 5 families. 1 1-S are non-glycosylated proteins which form hexameric structures [1,2]. Each of 

the subunits in the hexamer is itself composed of an acidic and a basic chain derived from a 
single precursor and linked by a disulfide bond. This structure is shown in the following 
representation. + + | | 

xxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxNGxCxxxxxxxxxxxxxxxxxxxxxxx *********<„ 

3 0 - — Acidic-subunit — >< Basic-subunit > < About-480-to-500- 

residues— >'C: conserved cysteine involved in a disulfide bond.'*': position of the 

pattern. Proteins that belong to the 1 1-S family are: pea and broad bean legumins, rape 
cruciferin, rice glutelins, cotton beta-globulins, soybean glycinins, pumpkin 1 1-S globulin, 
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oat globulin, sunflower helianthinin G3, etc. The region that includes the conserved cleavage 
site between the acidic and basic subunits (Asn-Gly) and a proximal cysteine residue which is 
involved in the interchain disulfide bond have been used as a signature pattern for this family 
of proteins. 

Consensus pattern: N-G-x-[DE](2)-x4feP^M^ 1,12)- 
[PAG]-D [C is involved in a disulfide bond 

[ 1] Hayashi M., Mori H., Nishimura M., Akazawa T., Hara-Nishimura L Eur. J. Biochem. 
172:627-632(1988).[ 2] Shotwell M.A., Afonso C, Davies E., Chesnut R.S., Larkins B.A. 
Plant Physiol. 87:698-704(1988). 

607. 7S seed storage protein 

7S globulin is one of the main storage proteins of most angiosperms and 
gymnosperms. The 73 storage proteins are homotrimers. 
Number of members: 67 

[1] The three-dimensional structure of canavalin from jack bean (Canavalia 
ensiformis). Ko TP, Ng JD, McPherson A; Plant Physiol 1993;101:729-744. 

608. Aspartate-semialdehyde dehydrogenase signature 

Aspartate- semialdehyde dehydrogenase (ASD) catalyzes the second step in the common 
biosynthetic pathway leading from Asp to diaminopimelate and Lys, to Met, and to Thr; the 
NADP-dependent reductive dephosphorylation of L-aspartyl phosphate to L-aspartate- 
semialdehyde. In bacteria and fungi, ASDis a protein of about 40 Kd (340 to 370 residues) 
whose sequence is not extremely well conserved [1]. A conserved cysteine residue has been 
implicated as important for the catalytic activity [2]. The region of conservation around the 
active site residue is too small to be used as signature pattern. Another more conserved 
region, located in the last third of the sequence, and which contains both a conserved cysteine 
as well as an histidine has been used instead. 

Consensus pattern: HAVM^LIVM SEP ID NO:4)]-^ SEQ ID N(.);7j >]-x(2)- 

C-x-R-^VM-f[LiVM SEP ID NO:4)1-x(;4V[GSC]-H-[STA 

[ 1] Baril C, Richaud C, Fourni E., Baranton G., Saint Girons I. J. Gen. Microbiol. 138:47- 
53(1992).[ 2] Karsten W.E., Viola R.E. Biochim. Biophys. Acta 1 121 :234-238(1992). 
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N-acetyl-gamma-glutamyl-phosphate reductase active site 

N-acetyl-gamma-glutamyl-phosphate reductase (EC 1.2.1.38 ) (AGPR) [1,2] is the enzyme 
that catalyzes the third step in the biosynthesis of arginine from glutamate, the NADP- 
5 dependent reduction of N-acetyl-5-glutamyl phosphate into N-acetylglutamate 5- 

semialdehyde.In bacteria it is a monofunctional protein of 35 to 38 Kd (gene argC) while in 
fungi it is part of a bifimctional mitochondrial enzyme (gene ARG5,6, argl 1 orarg-6) which 
contains a N-terminal acetylglutamate kinase (EC 2.7.2.8 ) domain and a C-terminal AGPR 
domain. In the Escherichia coli enzyme, a cysteine has been shown to be implicated in the 
1 0 catalytic activity, the region around this residue is well conserved and can be used as a 
signature pattern. 

Consensus pattern : f h WM][ L \ VM. S E Ql D .NO: 4}]- [GS A] -x-P-G - C- [F Y] - [A VP] -T- [G A] - 
x(3)-{G=FAG}EjT AC^ x-P [C is the active 

site residue] 

15 [1] Ludovice M. ? Martin J.F., Carrachas P., Liras P. J. Bacteriol. 174:4606-461 3(1 992). [ 2] 
Gessert S.F., Kim J.H., Nargang F.E., Weiss R.L. J. Biol. Chem. 269:8189-8203(1994). 

609. Sialyltransferase family, 
2 0 Number of members: 1 8 

610. SpoU rRNA Methylase family 

This family of proteins probably use S-AdoMet. Number of members: 58 
2 5 [ 1 ] SpoU protein of Escherichia coli belongs to a new family of putative rRNA methylases. 
Koonin EV, Rudd KE; Nucleic Acids Res 1993;21:5519-5519. [2] The spoU gene of 
escherichia coli , the fourth gene of the spoT operon, is essential for tRNA (Gml8) 2 ' 
methyltransferase activity. Persson BC, Jager G, Gustafsson C; Nucleic Acids Res 
1997;25:4093-4097. 

30 



611. Stathmin family signatures 
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Stathmin [1] (from the Greek 'stathmos' which means relay), is an ubiquitous intracellular 
protein, present in a variety of phosphorylated forms and which serves as a relay for diverse 
second messenger pathways. Its expression and phosphorylation are regulated throughout 
development and in response to extracellular signals regulating cell proliferation, 
5 differentiation and function. Stathmin is a highly conserved protein of 149 amino acid 

residues. Structurally, it consists of an N-terminal domain of about 45 residues followed by a 
78 residue alpha-helical domain consisting of a heptad repeat coiled coil structure and a C- 
terminal domain of 25 residues. Protein SCG10 is a neuron-specific, membrane-associated 
protein that accumulates in the growth cones of developing neurons. It is highly similar in its 

1 0 sequence to stathmin, but differs in that it contains an additional N-terminal hydrophobic 
segment of 32 residues which is probably responsible for its interaction with membranes. 
Xenopus protein XB3 is also evolutionary related to stathmin and also contains an additional 
N-terminal hydrophobic domain [2]. A conserved decapeptide which ends with the first three 
residues of the coiled coil domain and a second pattern that corresponds to part of the central 

1 5 region of the coiled coil have been selected as signatures for proteins of the stathmin family. 
Consensus pattern: P-[KRQ]-[KR](2)-[DE]-x-S-L-[EG]-E- 
Consensus pattern: A-E-K-R-E-H-E-[KR]-E- 

[1] Sobel A. Trends Biochem. Sci. 16:301-305(1991).[ 2] Maucuer A., Moreau J., Mechali 
M., Sobel A. J. Biol. Chem. 268:16420-16429(1993). 

20 

612. SUA5/yciO/yrdC family signature. The following uncharacterized proteins have been 
shown [1] to share regions of similarities: - Yeast protein SUA5. - Escherichia coli 
hypothetical protein yciO and HI 11 98, the corresponding Haemophilus influenzae protein. - 

2 5 Escherichia coli hypothetical protein yrdC and HI0656, the corresponding Haemophilus 

influenzae protein. - Bacillus subtilis hypothetical protein ywlC. - Mycobacterium leprae 
hypothetical protein in rfe-hemK intergenic region. - Methanococcus jannaschii hypothetical 
protein MJ0062.These are proteins of from 20 to 46 Kd which contain a number of conserved 
regions in their N-terminal section. They can be picked up in the database by the following 

3 0 pattern. 

Consensus pattern: ^bl VM4VVRLI V^^^ 
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[ 1] Bairoch A., Rudd K.E., Robison K. Unpublished observations (1995). 

5 613. Sucrose synthase 

Sucrose synthases catalyse the synthesis of sucrose from UDP-glucose and fructose. This 
family includes the bulk of the sucrose synthase protein. However the carboxyl terminal 
region of the sucrose synthases belongs to the glycosyl transferase family Glycos transf 1. 

10 

614. Sulfotransferase proteins 
Number of members: 59 

15 615. Synaptophysin / synaptoporin signature 

Synaptophysin and synaptoporin [1] are structurally related proteins, found in the membrane 
of synaptic vesicles, which may function as ionic or solute channels. These two glycoproteins 
seem to span the membrane four times. Both their N- and C-termini sequences seem to be 
cytoplasmically located. As a signature pattern for this family of proteins, a highly conserved 

2 0 region located in the beginning of the first intravesicular loop just after the first 

transmembrane domain has been selected. This region contains a cysteine residue that may be 
involved in a disulfide bond. 

Consensus pattern: L-S-V-[DE]-C-x-N-K-T [C may be involved in a disulfide bond 
[ 1] Knaus P., Marqueze-Pouey B., Scherer H., Betz H. Neuron 5:453-462(1990). 

25 

616. Syndecans signature 

Syndecans [1,2] (from the greek syndein; to bind together) are a family of transmembrane 
heparan sulfate proteoglycans which are implicated in the binding of extracellular matrix 
30 components and growth factors. Syndecans bind a variety of molecules via their heparan 

sulfate chains and can act as receptors or as co-receptors. Structurally, these proteins consist 
of four separate domains: a) A signal sequence; b) An extracellular domain (ectodomain) of 
variable length and whose sequence is not evolutionary conserved in the various forms of 
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syndecans. The ectodomain contains the sites of attachment of the heparan sulfate 
glycosaminoglycan side chains; c) A transmembrane region; d) A highly conserved 
cytoplasmic domain of about 30 to 35 residues which could interact with cytoskeletal 
proteins. The proteins known to belong to this family are: - Syndecan 1. - Syndecan 2 or 
5 fibroglycan. - Syndecan 3 or neuroglycan or N-syndecan. - Syndecan 4 or amphiglycan or 

ryudocan. - Drosophila syndecan. - Caenorhabditis elegans probable syndecan (F57C7.3).The 
signature pattern that has been developed for syndecans starts with the last residue of the 
transmembrane region and includes the first 10 residues of the cytoplasmic domain. This 
region, which contains four basic residues, could act as a stop transfer site. 
1 0 Consensus pattern: [FY]-R-[IM]-[KR]-K(2)-D-E-G-S-Y 

[ 1] Bernfield M., Kokenyesi R., Kato M., Hinkes M.T., Spring J., Gallo R.L., Lose E.J. 
Annu. Rev. Cell Biol. 8:365-393(1992).[ 2] David G. FASEB J. 7:1023-1030(1993). 

15 617. Syntaxin / epimorphin family signature 

The following proteins have been shown to be evolutionary related [1,2,3]: - Epimorphin (or 
syntaxin 2), a mammalian mesenchymal protein which plays an essential role in epithelial 
morphogenesis. - Syntaxin 1A (also known as antigen HPC-1) and syntaxin IB which are 
synaptic proteins which may be involved in docking of synaptic vesicles at presynaptic active 

2 0 zones. - Syntaxin 3. - Syntaxin 4, which is potentially involved in docking of synaptic 

vesicles at presynaptic active zones. - Syntaxin 5, which mediates endoplasmic reticulum to 
golgi transport. - Syntaxin 6, which is involved in intracellular vesicle trafficking. - Syntaxin 
7. - Yeast PEP 12 (or VPS6) which is required for the transport of proteases to the vacuole. - 
Yeast SED5 which is required for the fusion of transport vesicles with the Golgi complex. - 

2 5 Yeast SSOl and SS02 which are required for vesicle fusion with the plasma membrane. - 
Yeast VAM3, which is required for vacuolar assembly. - Arabidopsis thaliana protein 
KNOLLE which may be involved in cytokinesis. - Caenorhabditis elegans hypothetical 
proteins F35C8.4, F48F7.2, F55A1 1.2 and T01B1 1.3.The above proteins share the following 
characteristics: a size ranging from30 Kd to 40 Kd; a C-terminal extremity which is highly 

30 hydrophobic and isprobably involved in anchoring the protein to the membrane; a central, 

well conserved region, which seems to be in a coiled-coil conformation. The pattern specific 
for this family is based on the most conserved region of the coiled coil domain. 
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Consensus pattern: [RQ]-x(3)-fUm^{DVMA ..SKQ..ID.MM0}j.-x(2)-{IviVM}[L.|V'|V1 SEQ 
IDNQlM-rESHl-x^ SEP ID NO: 1 \1 -x-fDEVM4 f DEVM SEP ID 

NO;363)j- KrlVM HUVM SEP ID NO:4Y| -x(2)-feIW ti1JVM SEP ID NO:4Vf -[FS]-x^V 

[U^l}[LjVM.SJOJDNj3:^ 

{yADKO]IGADEp.SEQ..|D.Np:5 

SEQJLPJ^^^ 

x(2)-R^M : U E?VM SEP ID NO:4 )i 

[ 1] Bennett M.K., Garcia-Arraras J.E., Elferink L.A., Peterson K., Fleming A.M., Hazuka 
CD., Scheller R.H. Cell 74: 863-873(1 9931 [ 2] Spring J., Kato M., Bernfield M. Trends 
Biochem. Sci. 1 8: 124-125(1 993).[ 3] Pelham H.R.B. Cell 73:425-426(19931 

618. Sm protein 

The Ul, U2, U4/U6, and U5 small nuclear ribonucleoprotein 
particles (snRNPs) involved in pre-mRNA splicing contain seven 
Sm proteins (B/B', Dl, D2, D3, E, F and G) in common, which 
assemble around the Sm site present in four of the major 
spliceosomal small nuclear RNAs. These proteins contain a 
common sequence motif in two segments, Sml and Sm2, separated 
by a short variable linker. 

[1] Hermann H, Fabrizio P, Raker VA, Foulaki K, Hornig H, Brahms H, Luhrmann R EMBP 
J 1995;14:2076-2088. [2] Kambach C, Walke S, Young R, Avis JM, de la Fortelle E, Raker 
VA, Luhrmann R, Li J, Nagai K; Cell 1999;96:375-387. 



619. Skpl family 

[1] Stebbins CE, Kaelin WG Jr, Pavletich NP; Science 1999;284:455-461. 



620. Protein secY signatures 
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The eubacterial secY protein [1] plays an important role in protein export. It interacts with the 
signal sequences of secretory proteins as well as with two other components of the protein 
translocation system: secA and secE. SecY is an integral plasma membrane protein of 419 to 
492 amino acid residues that apparently contains ten transmembrane segments. Such a 
structure probablyconfers to secY a 'translocator' function, providing a channel for 
periplasmic and outer-membrane precursor proteins.Homologs of sec Y are found in 
archaebacteria [2]. SecY is also encoded in the chloroplast genome of some algae [3] where it 
could be involved in a prokaryotic-like protein export system across the two membranes of 
the chloroplast endoplasmic reticulum (CER) which is present in chromophyte 
andcryptophyte algae. Two signature patterns have been developed for secY proteins. The 
first corresponds to the second transmembrane region, which is the most conserved section of 
these proteins. The second spans the C-terminal part of the fourth transmembrane region, a 
short intracellular loop, and the N-terminal part of the fifth transmembrane region. 

Consensus pattern: \GSTUU\ r M¥^lAVMF SEP ID NO:2)j(2Vx-f»¥Mj| LIVM SE C ID 

NO:4ji-G4«VMj^ 

[AS]- {GS-i-Q^ 

f-U^MFAi[LIVM 

Consensus pattern: fM¥M4^^ jI.JVMFYW SEP ID NQ:26)] (2)-x-[DE]-x- 
[NST]-G-x-[GST]-[HVMF|[LIVMF NO:2jJ(3) 

[ 1] Ito K. Mol. Microbiol. 6:2423-2428(1 992).[ 2] Auer J., Spicker G., Boeck A. Biochimie 
73:683-688(1991).[ 3] Douglas S.E. FEBS Lett. 298:93-96(1992). 

621 . (Seed protein) Small hydrophilic plant seed proteins signature. The following small 
hydrophilic plant seed proteins are structurally related: - Arabidopsis thaliana proteins GEA1 
and GEA6. - Cotton late embryogenesis abundant (LEA) protein D-19. - Carrot EMB-1 
protein. - Barley LEA proteins B19.1 A, B19.1B, B19.3 and B19.4. - Maize late 
embryogenesis abundant protein Emb564. - Radish late seed maturation protein p8B6. - Rice 
embryonic abundant protein Empl. - Sunflower 10 Kd late embryogenesis abundant protein 
(DS10). - Wheat Em proteins. These proteins contains from 83 to 153 amino acid residues 
and may play a role[l,2] in equipping the seed for survival, maintaining a minimal level of 
hydration in the dry organism and preventing the denaturation of cytoplasmic components. 
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They may also play a role during imbibition by controlling water uptake. As a signature 
pattern, the best conserved region in the sequence of these proteins has been developed, it is a 
glycine-rich nonapeptide located in the N-terminal section. - 

5 Consensus pattern: G-[EQ]-T-V-V-P-G-G-T- 

[ 1] Dure L. Ill, Crouch M., Harada J., Ho T.-H. D., Mundy J., Quatrano R. ? Thomas T. ? Sung 
Z.R. Plant Mol. Biol. 12:475-486(1989).[ 2] Gaubier P., Raynal M., Hull G., Huestis G.M., 
Grellet F., Arenas C, Pages M, Delseny M. Mol. Gen. Genet 238:409-418(1993). 

10 

622. Serine carboxypeptidases, active sites 

All known carboxypeptidases are either metallo carboxypeptidases or 
serinecarboxypeptidases. The catalytic activity of the serine carboxypeptidases, like that of 

15 the trypsin family serine proteases, is provided by a charge relay system involving an aspartic 
acid residue hydrogen-bonded to a histidine, which is itself hydrogen-bonded to a serine [1]. 
Proteins known to be serine carboxypeptidases are: - Barley and wheat serine 
carboxypeptidases I, II, and III [2]. - Yeast carboxypeptidase Y (YSCY) (gene PRC1), a 
vacuolar protease involved in degrading small peptides. - Yeast KEX1 protease, involved in 

20 killer toxin and alpha- factor precursor processing. - Fission yeast sxa2, a probable 

carboxypeptidase involved in degrading or processing mating pheromones [3]. - Penicillium 
janthinellum carboxypeptidase SI [4]. - Aspergullus niger carboxypeptidase pepF. - 
Aspergullus satoi carboxypeptidase cpdS. - Vertebrate protective protein / cathepsin A [5], a 
lysosomal protein which is not only a carboxypeptidase but also essential for the activity of 

2 5 both beta-galactosidase and neuraminidase. - Mosquito vitellogenic carboxypeptidase (VCP) 
[6]. - Naegleria fowleri virulence-related protein Nf314 [7]. - Yeast hypothetical protein 
YBR139w. - Caenorhabditis elegans hypothetical proteins C08H9.1, F13D12.6, F32A5.3, 
F41C3.5 and K10B2.2.This family also includes: - Sorghum (s)-hydroxymandelonitrile lyase 
(hydroxynitrile lyase) (HNL) [8], an enzyme involved in plant cyanogenesis. The sequences 

30 surrounding the active site serine and histidine residues are highly conserved in all these 
serine carboxypeptidases. 

Consensus pattern: [--tlVMj[LW [S is the 

active site residue] 
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Consensus pattern: fM¥Pf[y W 

miMIll'X-m^m nVF^r SEP ID NO:564)1-x-ffiSONO^[GSDNOL SEP IDNO;565)] - 
fSAGV4{j^^ fWAQ4i JVAO SE P ID NQ:566)]-P-x(3)-[PSA] 

[H is the active site residue] 

[ 1] Liao D.I., Remington S.J. J. Biol. Chem. 265:6528-6531(1990).[ 2] Sorensen S.B., 
Svendsen I., Breddam K. Carlsberg Res. Commun. 54:1 93-202(1 989).[ 3] Imai Y., 
Yamamoto M. Mol. Cell. Biol. 12:1827-1834(1992).[ 4] Svendsen I., Hofmann T. ? Endrizzi 
J., Remington J., Breddam K. FEBS Lett. 333:39-43(1993).[ 5] Galjart N.J., Morreau H., 
Willemsen R., Gillemans N., Bonten EJ., d'Azzo A. J. Biol. Chem. 266:14754-14762(1991).[ 
6] Cho W.L., Deitsch K.W., Raikhel A.S. Proc. Natl. Acad. Sci. U.S.A. 88:10821- 
10824(1991).[ 7] Hu W.N., Kopachik W., Band R.N. Infect. Immun. 60:2418-2424(1992).[ 
8] Wajant H., Mundry K.W., Pfitzenmaier K. Plant Mol. Biol. 26:735-746(1 994).[ 9] 
RawIingsN.D., Barrett A.J. Meth. EnzymoL 244: 19-61 (1994).[E1] 

623. Serpins signature. Serpins (SERine Proteinase INhibitors) [1,2,3,4] are a group of 
structurally related proteins. They are high molecular weight (400 to 500 amino 
acids),extracellular, irreversible serine protease inhibitors with a well defined structural- 
functional characteristic: a reactive region that acts as a 'bait' for an appropriate serine 
protease. This region is found in the C-terminal part of these proteins. Proteins which are 
known to belong to the serpin family are listed below (references are only provided for 
recently determined sequences): - Alpha- 1 protease inhibitor (alpha- 1 -antitrypsin, 
contrapsin). - Alpha- 1 -antichymotrypsin, - Antithrombin III. - Alpha-2-antiplasmin. - 
Heparin cofactor II. - Complement CI inhibitor. - Plasminogen activator inhibitors 1 (PAI-1) 
and 2 (PAI-2). - Glia derived nexin (GDN) (Protease nexin I). - Protein C inhibitor. - Rat 
hepatocytes SPI-1, SPI-2 and SPI-3 inhibitors. - Human squamous cell carcinoma antigen 
(SCCA) which may act in the modulation of the host immune response against tumor cells. - 
A lepidopteran protease inhibitor. - Leukocyte elastase inhibitor which, in contrast to other 
serpins, is an intracellular protein. - Neuroserpin [5], a neuronal inhibitor of plasminogen 
activators and plasmin. - Cowpox virus crmA [6], an inhibitor of the thiol protease 
interleukin-lB converting enzyme (ICE). CrmA is the only serpin known to inhibit a non- 
serine proteinase. - Some orthopoxviruses probable protease inhibitors, which may be 
involved in the regulation of the blood clotting cascade and/or of the complement cascade in 
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the mammalian host. On the basis of strong sequence similarities, a number of proteins with 
no known inhibitory activity are said to belong to this family: - Birds ovalbumin and the 
related genes X and Y proteins. - Angiotensinogen; the precursor of the angiotensin active 
peptide. - Barley protein Z; the major endosperm albumin. - Corticosteroid binding globulin 
(CBG). - Thyroxine-binding globulin (TBG). - Sheep uterine milk protein (UTMP) and pig 
uteroferrin-associated protein (UFAP). - Hsp47 ? an endoplasmic reticulum heat-shock protein 
that binds strongly to collagen and could act as a chaperone in the collagen biosynthetic 
pathway [7]. - Maspin, which seems to function as a tumor supressor [5]. - Pigment 
epithelium-derived factor precursor (PEDF), a protein with a strong neutrophic activity [8]. - 
Ep45, an estrogen-regulated protein from Xenopus [9]. A signature pattern has been 
developed for this family of proteins, centered on a well conserved Pro-Phe sequence which 
is found ten to fifteen residues on the C-terminal side of the reactive bond 

Consensus pattern: { : «V:M4^ 

[iiVMI=-¥jtL[VMj;;Y SEQJD.NOUSJJ- f«V ? ^ 
ft4-VM-FAi-l;}.[nyMFAIlS 

[ 1] Carrell R., Travis J. Trends Biochem. Sci. 10:20-24(1985).[ 2] Carrell R., Pemberton 
P.A., Boswell D.R. Cold Spring Harbor Symp. Quant. Biol. 52:527-535(1987).[ 3] Huber R., 
Carrell R.W. Biochemistry 28:8951-8966(1989).[ 4] Remold-O'Donneel E. FEBS Lett. 
315:1 05-1 08( 1 993).[ 5] Osterwalder T., Contartese J., Stoeckli E.T., Kuhn T.B., Sonderegger 
P. EMBO J. 15:2944-2953(1996).[ 6] Komiyama T., Ray C.A., Pickup D.J., Howard A.D., 
Thornberry N.A., Peterson E.P. ? Salvesen G. J. Biol. Chem. 269:19331-19337(1994).[ 7] 
Clarke E., Sandwal B.D. Biochim. Biophys. Acta 1 129:246~248(1992).[ 8] Zou Z., 
Anisowicz A. 5 Neveu M, Rafidi K., Sheng S., Sager R., Hendrix M.J., Seftor E. ? Thor A. 
Science 263:526-529(1994).[ 9] Steele F.R., Chader G.J., Johnson L.V., Tombran-Tink J. 
Proc. Natl. Acad. Sci. U.S.A. 90:1526-1530(1993).[10] Holland L.J., Suksang C. ? Wall A.A., 
Roberts L.R., Moser D.R., Bhattacharya A. J. Biol. Chem. 267:7053-7059(1992). 

624. Sigma-54 interaction domain signatures and profile 
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Some bacterial regulatory proteins activate the expression of genes from promoters 
recognized by core RNA polymerase associated with the alternative sigma-54 factor. These 
have a conserved domain of about 230 residues involved in the ATP-dependent [1,2] 
interaction with sigma-54. This domain has been found in the proteins listed below: - acoR 
from Alcaligenes eutrophus, an activator of the acetoin catabolism operon acoXABC. - algB 
from Pseudomonas aeruginosa, an activator of alginate biosynthetic gene algD. - dctD from 
Rhizobium, an activator of dctA, the C4-dicarboxylate transport protein. - dhaR from 
Citrobacter freundii, a regulator of the dha operon for glycerol utilization. - fhlA from 
Escherichia coli, an activator of the formate dehydrogenase H and hydrogenase III structural 
genes. - flbD from Caulobacter crescentus, an activator of flagellar genes. - hoxA from 
Alcaligenes eutrophus, an activator of the hydrogenase operon. - hrpS from Pseudomonas 
syringae, an activator of hprD as well as other hrp loci involved in plant pathogenicity. - 
hupRl from Rhodobacter capsulatus, an activator of the [NiFe] hydrogenase genes hupSL. - 
hydG from Escherichia coli and Salmonella typhimurium, an activator of the hydrogenase 
activity. - levR from Bacillus subtilis, which regulates the expression of the levanase operon 
(levDEFG and sacC). - nifA (as well as anfA and vnfA) from various bacteria, an activator of 
the nif nitrogen- fixing operon. - ntrC, from various bacteria, an activator of nitrogen 
assimilatory genes such as that for glutamine synthetase (glnA) or of the nif operon. - pgtA 
from Salmonella typhimurium, the activator of the inducible phospho- glycerate transport 
system. - pilR from Pseudomonas aeruginosa, an activator of pilin gene transcription. - rocR 
from Bacillus subtilis, an activator of genes for arginine utilization - tyrR from Escherichia 
coli, involved in the transcriptional regulation of aromatic amino-acid biosynthesis and 
transport. - wtsA, from Erwinia stewartii, an activator of plant pathogenicity gene wtsB. - 
xylR from Pseudomonas putida, the activator of the tol plasmid xylene catabolism operon 
xylCAB and of xylS. - Escherichia coli hypothetical protein yfhA. - Escherichia coli 
hypothetical protein yhgB. About half of these proteins (algB, dcdT, flbD, hoxA, hupRl, 
hydG, ntrC, pgtA and pilR) belong to signal transduction two-component systems [3] and 
possess a domain that can be phosphorylated by a sensor-kinase protein in their N- terminal 
section. Almost all of these proteins possess a helix-turn-helix DNA-binding domain in their 
C-terminal section. The domain which interacts with the sigma-54 factor has an ATPase 
activity. This may be required to promote a conformational change necessary for 
theinteraction [4]. The domain contains an atypical ATP-binding motif A (P-loop) as well as 
a form of motif B. The two ATP-binding motifs are located in the N-terminal section of the 
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domain; signature patterns have been developed for both motifs. Other regions of the domain 
are also conserved. One of them, located in the C-terminal section, has been selected as a 
third signature pattern. 

Consensus pattern: f fc£VMF¥]-[y V M (3 )-x-G - [DEQ] - [STE] -G- 

Consensus pattern: [GS]-x-fLI-V-^ 

fBNfiQASWf DNBQASH SFX>IDNK>i^^ ID NO;570l l-G- 

| ; S4-IMHS:iIM..SEnjt} NO:57I )j- f-t4VMF¥j[y VM 

i-l;;!VM][lJVM.S.B2iD 

Consensus pattern: [FYW]-P-[GS]-N-|^tI-¥M41 I J v M SEQ ID NO :4 Q-R-[EQ]-L-x- 

[ 1] Morrett E., Segovia L. J. Bacteriol. 175:6067-6074(1993).[ 2] Austin S., Kundrot C, 
Dixon R. Nucleic Acids Res. 19:2281-2287(1991).[ 3] Albright L.M., Huala E., Ausubel 
F.M. Annu. Rev. Genet. 23:31 1-336(1 989).[ 4] Austin S., Dixon R. EMBO J. 1 1 :2219- 
2228(1992). 

625. Sigma-70 factors family signatures 

Sigma factors [1] are bacterial transcription initiation factors that promote the attachment of 
the core RNA polymerase to specific initiation sites and arethen released. They alter the 
specificity of promoter recognition. Most bacteria express a multiplicity of sigma factors. 
Two of these factors, sigma-70 (gene rpoD), generally known as the major or primary sigma 
factor, and sigma-54 (gene rpoN or ntrA) direct the transcription of a wide variety of genes. 
The other sigma factors, known as alternative sigma factors, are required for the transcription 
of specific subsets of genes. With regard to sequence similarity, sigma factors can be grouped 
into two classes: the sigma-54 and sigma-70 families. The sigma-70 family includes, in 
addition to the primary sigma factor, a wide variety of sigma factors, some of which are listed 
below: - Bacillus sigma factors involved in the control of sporulation-specific genes: sigma-E 
(sigE or spoIIGB), sigma-F (sigF or spoIIAC), sigma-G (sigG or spoIIIG), sigma-H (sigH or 
spoOC) and sigma-K (sigK or spoIVCB/spoIIIC). - Escherichia coli and related bacteria 
sigma-32 (gene rpoH or htpR) involved in the expression of heat shock genes. - Escherichia 
coli and related bacteria sigma-27 (gene fliA) involved in the expression of the flagellin gene. 
- Escherichia coli sigma-S (gene rpoS or katF) which seems to be involved in the expression 
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of genes required for protection against external stresses. - Myxococcus xanthus sigma-B 
(sigB) which is essential for the late-stage differentiation of that bacteria. Alignments of the 
sigma-70 family permit the identification of four regions of high conservation [2,3]. Each of 
these four regions can in turn be subdivided into a number of sub-regions. Signature patterns 
based on the two best-conserved sub-regions have been developed. The first pattern 
corresponds to sub-region 2.2;the exact function of this sub-region is not known although it 
could be involved in the binding of the sigma factor to the core RNA polymerase. The second 
pattern corresponds to sub-region 4.2 which seems to harbor a DNA-binding 'helix-turn-helix' 
motif involved in binding the conserved -35region of promoters recognized by the major 
sigma factors. The second pattern starts one residue before the N-terminal extremity of the 
HTH region and ends six residues after its C-terminal extremity. 

Consensus pattern: [DE]-j«¥MP}[L[VMF.SEQ .]D N<3:2il(2)-fHEQS-}[|iEQS SEQ ID 
NO;^ E S EQ 

111 NO;S74jj-x- PGgAM~j[ G SA M SEQ ID NQ:575 ;}]-fLlVMAP:j[L[VMAP SEP I D NO:253\ j 
Consensus pattern: [STO]-x(2)-[DEQ]-(«^ 

f Ef^MAj-LU VMA SEO j I) N 

x(3)-frPtAMFW4iT.rv7vt FW SEP ID NO: ! 3)Vx(2)-fM¥M||I.J,VM SEP ID NO A)} 
[ 1] Helmann J.D., Chamberlin M.J. Annu. Rev. Biochem. 57:839-872(1 988).[ 2] Gribskov 
M., Burgess R.R. Nucleic Acids Res. 14:6745-6763(1986).[ 3] Lonetto M.A., Gribskov M., 
Gross C.A. J. Bacteriol. 174:3843-3849(1992).[ 4] Lonetto M.A. 5 Brown K.L., Rudd K.E., 
Buttner M.J. Proc. Natl. Acad. ScL U.S.A. 91:7573-7577(1994). 

626. Signal carboxyl-terminal domain. 430 members. 

627. Signal peptidases I signatures 

Signal peptidases (SPases) [1] (also known as leader peptidases) remove the signal peptides 
from secretory proteins. In prokaryotes three types of Spases are known: type I (gene lepB) 
which is responsible for the processing of the majority of exported pre-proteins; type II (gene 
lsp) which only process lipoproteins, and a third type involved in the processing of pili 
subunits. SPase I is an integral membrane protein that is anchored in the cytoplasmic 
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membrane by one (in B. subtilis) or two (in E. coli) N-terminal transmembrane domains with 
the main part of the protein protuding in the periplasmic space. Two residues have been 
shown [2,3] to be essential for the catalytic activity of SPase I: a serine and an lysine.SPase I 
is evolutionary related to the yeast mitochondrial inner membrane protease subunit 1 and 2 
(genes IMP1 and IMP2) which catalyze the removal of signal peptides required for the 
targeting of proteins from the mitochondrial matrix, across the inner membrane, into the 
inter-membrane space [4]. In eukaryotes the removal of signal peptides is effected by an 
oligomeric enzymatic complex composed of at least five subunits: the signal peptidase 
complex (SPC). The SPC is located in the endoplasmic reticulum membrane. Two 
components of mammalian SPC, the 18 Kd (SPC 18) and the 21 Kd (SPC21) subunits as well 
as the yeast SEC1 1 subunit have been shown [5] to share regions of sequence similarity with 
prokaryotic SPases I and yeast IMP1/IMP2. Three signature patterns for these proteins have 
been developed. The first signature contains the putative active site serine, the second 
signature contains the putative active site lysine which is not conserved in the SPC subunits, 
and the third signature corresponds to a conserved region of unknown iological significance 
which is located in the C-terminal section of all these proteins. 
Consensus pattern: [GS]-x-S-M-x-[PS]-[AT]-[LF] [S is an active site residue] 
Consensus pattern: K-R-|4^VMS4^ [I,IVMSTA SEQ ID NQ:433 ) ](2)-G-x-[PG]-G-[DH]-x- 
| L-lVM]jjJ VM..SEQJD [K is an active site 

residue] 

Consensus pattern: H^VMF¥¥}[LIVM 
[SND]-x(2)-[SG] 

[ 1] Dalbey R.E., von Heijne G. Trends Biochem. Sci. 17:474-478(1 992). [ 2] Sung M, 
Dalbey R.E. J. Biol. Chem. 267: 1 3 1 54-1 3 1 59(1992),[ 3] Black M.T. J. Bacterid. 175:4957- 
4961(1993).[ 4] Nunnari J., Fox T.D., Walter P. Science 262:1997-2004(1993).[ 5] van Dijl 
J.M., de Jong A., Vehmaanpera J., Venema G., Bron S. EMBO J. 1 1 :28 19-2828(1 992). [ 6] 
Rawlings N.D., Barrett AJ. Meth. EnzymoL 244:19-61(1994). [El] 

628. (sodcu) Copper/Zinc superoxide dismutase signatures 

Copper/Zinc superoxide dismutase (SODC) [1] is one of the three forms of an enzyme that 
catalyzes the dismutation of superoxide radicals. SODC binds one atom each of zinc and 
copper. Various forms of SODC are known: acytoplasmic form in eukaryotes, an additional 
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chloroplast form in plants, an extracellular form in some eukaryotes, and a periplasmic form 
in prokaryotes. The metal binding sites are conserved in all the known SODC sequences [2]. 
Two signature patterns have been derived for this family of enzymes: the first one contains 
two histidine residues that bind the copper atom; the second one islocated in the C-terminal 
section of SODC and contains a cysteine which is involved in a disulfide bond. 
Consensus pattern: [GA]-RMFA-'^ 

NQU27)^ SFQ I D NO :57S)l [The two H's are 

copper ligands] 

Consensus pattern: G-[GN]-[SGA]-G-x-R-x-[SGA]-C-x(2)-[IV] [C is involved in a disulfide 
bond] 

[ 1] Bannister J. V., Bannister W.H., Rotilio G. CRC Crit. Rev. Biochem. 22:1 1 1-154(1987).[ 
2] Smith M.W., Doolittle R.F. J. MoL EvoL 34:175-184(1992). 

629. (sodfe) Manganese and iron superoxide dismutases signature 

Manganese superoxide dismutase (SODM) [1] is one of the three forms of an enzyme that 
catalyzes the dismutation of superoxide radicals. The four ligands of the manganese atom are 
conserved in all the known SODM sequences. These metal ligands are also conserved in the 
related iron form ofsuperoxide dismutases [2,3]. A short conserved region which includes 
two of the four ligands: an aspartate and a histidine has been selected as a signature. 
Consensus pattern: D-x-W-E-H-[STA]-[FY](2) [D and H are manganese/iron ligands] 
[ 1] Bannister J.V., Bannister W.H., Rotilio G. CRC Crit. Rev. Biochem. 22:1 1 1-1 54(1 987). [ 
2] Parker M.W., Blake C.C.F. FEBS Lett. 229:377-382(1988).[ 3] Smith M.W., Doolittle 
R.F. J. Mol. Evol. 34:175-184(1992). 

630. Spectrin repeat 

Spectrin repeats are found in several proteins involved in 
cytoskeletal structure. These include spectrin, alpha-actinin 

and dystrophin.The sequence repeat used in this family is taken from the structural repeat in 
reference [2]. The spectrin repeat forms a three helix bundle. The second helix is interrupted 
by proline in some sequences. 
Number of members: 898 
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[1] Actin-binding proteins. 1 : Spectrin super family. Hartwig JH; Protein Profile 
1995;2:732-732. [2] Crystal structure of the repetitive segments of spectrin. Yan Y, 
Winograd E, Viel A, Cronin T, Harrison SC, Branton D; Science 1993;262:2027-2030. 

631. (subtilase) Streptomyces subtilisin-type inhibitors signature 

Bacteria of the Streptomyces family produce a family of proteinase inhibitors[l] 
characterized by their strong activity toward subtilisin. They arecollectively known as SSI's: 
Streptomyces Subtilisin Inhibitors. Some SSI'salso inhibit trypsin or chymotrypsin. In their 
mature secreted form, SSI's areproteins of about 110 residues with two conserved disulfide 
bonds. + + + + | | | | 

xxxxxxxxxxxxxxCxxxxxxxCxxxxxxxxxCx#xxxxxxxxxxxxCxxxxxx ************ 'C' : 
conserved cysteine involved in a disulfide bond.'#': active site residue/*': position of the 
pattern. 

Consensus pattern: C-x-P-x(2,3)-G-x-H-P-x(4)-A-C-[ATD]-x-L [The two C's are involved in 
a disulfide bond] 

[ 1] Taguchi S., Kojima S., Terabe M, Miura K.-I. ? Momose H. Eur. J. Biochem. 220:91 1- 
918(1994). 

632. Sugar transport proteins signatures 

In mammalian cells the uptake of glucose is mediated by a family of closely related transport 
proteins which are called the glucose transporters [l,2,3].At least seven of these transporters 
are currently known to exist (in Human they are encoded by the GLUT1 to GLUT7 
genes). These integral membrane proteins are predicted to comprise twelve membrane 
spanning domains. The glucose transporters show sequence similarities [4,5] with a number 
of other sugar or metabolite transport proteins listed below (references are only provided for 
recently determined sequences). - Escherichia coli arabinose-proton symport (araE). - 
Escherichia coli galactose-proton symport (galP). - Escherichia coli and Klebsiella 
pneumoniae citrate-proton symport (also known as citrate utilization determinant) (gene cit). 
- Escherichia coli alpha-ketoglutarate permease (gene kgtP). - Escherichia coli 
proline/betaine transporter (gene proP) [6]. - Escherichia coli xylose-proton symport (xylE). - 
Zymomonas mobilis glucose facilitated diffusion protein (gene glf). - Yeast high and low 
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affinity glucose transport proteins (genes SNF3, HXT1 to HXT14). - Yeast galactose 
transporter (gene GAL2). - Yeast maltose permeases (genes MAL3T and MAL6T). - Yeast 
myo-inositol transporters (genes ITR1 and ITR2). - Yeast carboxylic acid transporter protein 
homolog JEN1. - Yeast inorganic phosphate transporter (gene PH084). - Kluyveromyces 
lactis lactose permease (gene LAC 12). - Neurospora crassa quinate transporter (gene Qa-y), 
and Emericella nidulans quinate permease (gene qutD). - Chlorella hexose carrier (gene 
HUP1). - Arabidopsis thaliana glucose transporter (gene STP1). - Spinach sucrose 
transporter. - Leishmania donovani transporters Dl and D2. - Leishmania enriettii probable 
transport protein (LTP). - Yeast hypothetical proteins YBR241c, YCR98c and YFL040w. - 
Caenorhabditis elegans hypothetical protein ZK637.L - Escherichia coli hypothetical proteins 
yabE, ydjE and yhjE. - Haemophilus influenzae hypothetical proteins HI0281 and HI0418. - 
Bacillus subtilis hypothetical proteins yxbC and yxdF. It has been suggested [4] that these 
transport proteins have evolved from theduplication of an ancestral protein with six 
transmembrane regions, this hypothesis is based on the conservation of two G-R-[KR] motifs. 
The first one is located between the second and third transmembrane domains and the second 
one between transmembrane domains 8 and 9. Two patterns have been developed to detect 
this family of proteins. The first pattern is based on the G-R-[KR] motif; but because this 
motif is too short to be specific to this family of proteins, a pattern from a larger region 
centered on the second copy of this motif was derived. The second pattern is based on a 
number of conserved residues which are located at the end of the fourth transmembrane 
segment and in the short loop region between the fourth and fifth segments. 
Consensus pattern: I EI VMST AG] [L{ V MSI 'AG SEC) ID N Q:4 4) j- 

MQlLM R-[RK]-x(4,6)- 

\Q STA|{G STA . S E _Q .ID NO;] 9}]. 

Consensus pattern: H^fVMR[ 'LIVMF SEP ID NO:2Yj-x-G-fem^ SE P ID 

NO;SLU-x(2)-G-x(^ [RK] 
[ 1] Silverman M. Annu. Rev. Biochem. 60:757-794(1991).[ 2] Gould G.W., Bell G.I. Trends 
Biochem. Sci. 15:18-23(1990).[ 3] Baldwin S.A. Biochim. Biophys. Acta 1 154:17-49(1993).[ 
4] Maiden M.C.J., Davis E.O., Baldwin S.A., Moore D.C.M., Henderson P J.F. Nature 
325:641-643(1987).[ 5] Henderson P.J.F. Curr. Opin. Struct. Biol. 1 :590-601(1991).[ 6] 
Culham D.E., Lasby B., Marangoni A.G., Milner J.L., Steer B.A., van Nues R. W., Wood 
J.M. J. Mol. Biol. 229:268-276(1993). 
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633. Synaptobrevin signature 

Synaptobrevin [1] is an intrinsic membrane protein of small synaptic vesicles whose function 
5 is not yet known, but which is highly conserved in mammals, electric ray (where its is known 
as VAMP-1), Drosophila and yeast [2]. In yeast there are two closely related forms of 
synaptobrevin (genes SNC1 andSNC2) while in mammals there is at least 4 (genes SYB1, 
SYB2, SYB3 and SYBL1). Structurally synaptobrevin consist of a N-terminal cytoplasmic 
domain of from 90 to 1 10 residues, followed by a transmembrane region, and then by a short 

1 0 (from 2 to 22 residues) C-terminal intravesicular domain. As a signature pattern for 
synaptobrevin, a highly conserved stretch of residues located in the central part of the 
sequence was selected. 
Consensus pattern: N^fcA^MftU^ 
[KL]-V-x-[DEQ]-R-x(2)-[KR]-^ 

1 5 NO: 581)]- x4WMj[L[VM SEP I'D NO:4)j-x-[DE]-[KR]-[TA]-[DE] 

[ 1] Suedhof T.C., Baumert M., Perin M S., Jahn R. Neuron 2:1475-1481(1989).[ 2] Gerst 
J.E., Rodgers L., Riggs M., Wigler M. Proc. Natl. Acad. Sci. U.S.A. 89:4338-4342(1992). 
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634. TBC domain. Identification of a TBC domain in GYP6_YEAST and GYP7_ YEAST, 
which are GTPase activator proteins of yeast Ypt6 and Ypt7, imply that these domains are 
GTPase activator proteins of Rab-like small GTPases. Number of members: 55 

[1] Medline: 96032578. Molecular cloning of a cDNA with a novel domain present in the 
tre-2 oncogene and the yeast cell cycle regulators BUB2 and cdcl6. Richardson PM, Zon LI; 
Oncogene 1995;11:1139-1148. 

[2]Medline: 97398935. A shared domain between a spindle assembly checkpoint protein and 
Ypt/Rab-specific GTPase-activators. Neuwald AF; Trends Biochem Sci 1997;22:243-244. 

635. Transcription factor TFIID repeat signature (TBP) 

Transcription factor TFIID (or TATA-binding protein, TBP) [1,2] is a general factor that 
plays a major role in the activation of eukaryotic genes transcribed by RNA polymerase II. 
TFIID binds specifically to the TATA box promoter element which lies close to the position 
of transcription initiation. There is a remarkable degree of sequence conservation of a C- 
terminal domain of about 180 residues in TFIID from various eukaryotic sources. This region 
isnecessary and sufficient for TATA box binding. The most significant structural feature of 
this domain is the presence of two conserved repeats of a 77 amino-acid region. The 
intramolecular symmetry generates a saddle-shaped structure that sits astride the DNA [3], 
Drosophila TRF (TBP-related factor) [4] is a sequence-specific transcription factor that also 
binds to the TATA box and is highly similar to TFIID. Archaebacteria also possess a TBP 
homolog [5]. A signature pattern that spans the last 50 residues of the repeated region has 
been derived.- 

Consensus pattern: Y-x-P-x(2)-[IF]-x(2)-{W¥M|[LIVM.SEQ J.D NO:4.)](2)-x-[KRH]-x(3)-P- 
[RKQ]-x(3> L-fWMlfUVM SEP I D N 0 : 4) ] -F -x- T STN1 -G- [KR] -fl^VM-}! T J VM SEP ID 
NOj.^ [AGC]-x(7)-[LIVM 
[ 1] Hoffmann A. ? Sinn E., Yamamoto T. ? Wang J., Roy A., Horikoshi M., Roeder R.G. 
Nature 346:387-390(1990).[ 2] Gash A., Hoffmann A., Horikoshi M, Roeder R.G., Chua N.- 
H. Nature 346:390-394(1990).[ 3] Nikolov D.B., Hu S.-H. 5 Lin J. 5 Gasch A., Hoffmann A., 
Horikoshi M., ChuaN.-H., Roeder R.G., Burley S.K. Nature 360:40-46(1992).[ 4] Crowley 
T.E., Hoey T., Liu J.-K. ? Jan Y.N., Jan L.Y., Tjian R. Nature 361:557-561(1993).[ 5] Marsh 
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T.L., Reich C.I., Whitelock R.B., Olsen GJ. Proc. Natl. Acad. Sci. U.S.A. 91:4180- 
4184(1994). 



636. Translationally controlled tumor protein signatures (TCTP) 

Mammalian translationally controlled tumor protein (TCTP) (or P23) is a protein which has 
been found to be preferentially synthesized in cells during the early growth phase of some 
types of tumor [1,2], but which is also expressed in normal cells. The physiological function 
of TCTP is still not known. It is a hydrophilic protein of 18 to 20 Kd. Close homologs have 
been found in plants [3], earthworm [4], Caenorhabditis elegans (F52H2.1 1), Hydra, budding 
yeast (YKL056c) [5] and fission yeast (SpAClF12.02c) Two of the best conserved regions 
have been selected as signature patterns for TCTP. 

Consensus pattern: [IFA]-[GA]-[GAS]-N4PAK]-S-[GA]-E-[G^ ID 

m}l^)i-m^QQM\DEO GA SEP ID NO: 584)] 

Consensus pattern: fR:;V-^ jo 

NQ;5MU-G-E-x-[MA]^ [AV]-x(3> 
[FYW] 

[ 1] Boehm H., Beendorf R., Gaestel M., Gross B., Nuernberg P., Kraft R., Otto A. ? Bielka H. 
Biochem. Int. 19:277-286(1989).[ 2] Makrides S., Chitpatima ST., Bandyopadhyay R., 
Brawerman G. Nucleic Acids Res. 1 6:2350-2350(1 988).[ 3] Pay A., Heberle-Bors E., Hirt H. 
Plant Mol. Biol. 19:501-503(1992).[ 4] Stuerzenbaum S.R., Kille P., Morgan A.J. Biochim. 
Biophys. Acta 1 398:294-304(1 998).[ 5] Rasmussen S.W. Yeast 10:S63-S68(1994). 

637. TFIIS zinc ribbon domain signature 

Transcription factor S-II (TFIIS) [1] is a eukaryotic protein necessary for efficient RNA 
polymerase II transcription elongation, past template-encoded pause sites. TFIIS shows 
DNA-binding activity only in the presence of RNA polymerase II. It is a protein of about 300 
amino acids whose sequence is highly conserved in mammals, Drosophila, yeast (where it 
was first known as PPR2, a transcriptional regulator of URA4, and then as DST1, the DNA 
strand transfer protein alpha [2]) and in the archaebacteria Sulfolobus acidocaldarius [3]. This 
family also includes the eukaryotic and archebacterial RNA polymerase subunits of the 15 
Kd / M family (see < PDOC00790 » as well as the following viral proteins: - Vaccinia virus 
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RNA polymerase 30 Kd subunit (rpo30) [4]. - African swine fever virus protein I243L 
[5]. The best conserved region of all these proteins contains four cysteines that bind a zinc ion 
and fold in a conformation termed a 'zinc ribbon' [6]. Besides these cysteines, there are a 
number of other conserved residues which can be used to help define a specific pattern for 
this type of domain. 

Consensus pattern: C-x(2)-C>x(9>^4V-^Q^yyjIJVMQSAR SEP ID NQ:587)]-[QH]- 
fS^:4 iSTOf, SEP ID KO :588)]-rRA1-I^AGR4 rSACR SEP ID NO:589i] -x-[DE]- [DET]- 
f^NtMMii [The four Cs are zinc 

ligands] 

[ 1] Hirashima S., Hirai H. ? Nakanishi Y., Natori S. J. Biol. Chem. 263:3858-3863(1988).[ 2] 
Kipling D., Kearsey S.E. Nature 353:509-509(1991).[ 3] Langer D., Zillig W. Nucleic Acids 
Res. 21:2251-2251(1993).[ 4] Ahn B.-Y., Gershon P.D., Jones E.V., Moss B. Mol. Cell. Biol. 
10:5433-5441(1990).[ 5] Rodriguez J.M., Salas M.L., Vinuela E. Virology 1 86:40-52(1 992).[ 
6] Qian X., Jeon C, Yoon H., Agarwal K., Weiss M.A. Nature 365:277-279(1993). 

638. Tetrahydrofolate dehydrogenase/cyclohydrolase signatures (THF DHG CYH) 
Enzymes that participate in the transfer of one-carbon units are involved in various 
biosynthetic pathways. In many of these processes the transfers of one-carbon units are 
mediated by the coenzyme tetrahydrofolate (THF). Various reactions generate one-carbon 
derivatives of THF which can be interconverted between different oxidation states by 
formyltetrahydrofolate synthetase(EC 6.3.4.3 ). methylenetetrahydrofolate dehydrogenase 
(EC 1.5.1.5 or EC 1.5.1.15 ) and methenyltetrahydrofolate cyclohydrolase (EC 3. 5.4.9 ). The 
dehydrogenase and cyclohydrolase activities are expressed by a variety of multifunctional 
enzymes: - Eukaryotic C-l -tetrahydrofolate synthase (Cl-THF synthase), which catalyzes all 
three reactions described above. Two forms of Cl-THF synthases are known [1], one is 
located in the mitochondrial matrix, while the second one is cytoplasmic. In both forms the 
dehydrogenase/cyclohydrolase domain is located in the N-terminal section of the 900 amino 
acids protein and consists of about 300 amino acid residues. The Cl-THF synthases are 
NADP- dependent. - Eukaryotic mitochondrial bifunctional dehydrogenase/cyclohydrolase 
[2]. This is an homodimeric NAD-dependent enzyme of about 300 amino acid residues. - 
Bacterial folD [3]. FolD is an homodimeric bifunctional NADP-dependent enzyme of about 
290 amino acid residues. The sequence of the dehydrogenase/cyclohydrolase domain is 
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highly conserved in all forms of the enzyme. Two conserved regions have been selected as 
signature patterns. The first one is located in the N-terminal part of these enzymes and 
contains three acidic residues. The second pattern is a highly conserved sequence of 9 amino 
acids which is located in the C-terminal section. 
Consensus pattern: [EQ]-x-[EQK]-fUVM^ 

SEUJUL^ X (5). 
fiv*¥4v4Rf UVMF SEP I D NO:2)i nVO-L-P-[LV] 
Consensus pattern: P-G-G-V-G-P-[MF]-T-[IV] 

[ 1] Shannon K.W., Rabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988).[ 2] Belanger C, 
Mackenzie R.E. J. Biol. Chem. 264:4837-4843(1 989).[ 3] d'Ari L. ? Rabinowitz J.C. J. Biol. 
Chem. 266:23953-23958(1991). 

639. Triosephosphate isomerase active site (TIM) 

Triosephosphate isomerase (EC 5.3.1.1 ) (TIM) [1] is the glycolytic enzyme that catalyzes the 
reversible interconversion of glyceraldehyde 3 -phosphate and dihydroxyacetone phosphate. 
TIM plays an important role in several metabolic pathways and is essential for efficient 
energy production. It is a dimer of identical subunits, each of which is made up of about 250 
amino-acid residues. A glutamic acid residue is involved in the catalytic mechanism [2]. The 
sequence around the active site residue is perfectly conserved in all known TIM's and can be 
used as a signature pattern for this type of enzyme. 

Consensus pattern: fAV1-Y-E4HUA^|I J VM SE P ID NQ: 4)I-W-[SA]-T-G-T-[GK] [E is 
the active site residue] 

[ 1] Lolis E., Alber T., Davenport R.C., Rose D., Hartman F.C., Petsko G.A. Biochemistry 
29:6609-6618(1990).[ 2] Knowles J.R. Nature 350:121-124(1991). 

640. Thymidine kinase cellular-type signature (TK) 

Thymidine kinase (TK) (EC 2.7.1.21 ) is an ubiquitous enzyme that catalyzes the ATP- 
dependent phosphorylation of thymidine. A comparison of TK sequences has shown [1,2,3] 
that there are two different families of TK. One family groups together TK from herpes 
viruses as well as cellular thymidylate kinases, while the second family currently consists of 
TK from the following sources: - Vertebrates. - Bacterial. - Bacteriophage T4. - Pox viruses. 
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- African swine fever virus (ASF). - Fish lymphocystis disease virus (FLDV).A conserved 

region which is located in the C-terminal section of these enzymes has been selected as a 

signature pattern for this family of TKA. 

Consensus pattern: [GA]-x(l,2)-[DE]-x-Y-x-[STAP}[CT 

x-[CHH«¥MF¥^ 

[ 1] Boyle D.B., Coupar B.E.H., Gibbs A. J., Seigman L.J., Both G.W. Virology 156:355- 
365(1987).[ 2] Blasco R., Lopez-Otin C, Munoz M., Bockamp E.-O., Simon-Mateo C, 
Vinuela E. Virology 178:301-304(1990).[ 3] Robertson G.R., Whalley J.M. Nucleic Acids 
Res. 16:11303-11317(1988). 



641 . Thymidine kinase from herpesvirus (TK herpes) 
[1] 

Medline: 96003730 

Crystal structures of the thymidine kinase from herpes 
simplex virus type-1 in complex with deoxythymidine and 
ganciclovir. 

Brown DG, Visse R, Sandhu G, Davies A, Rizkallah PJ, Melitz 
C, Summers WC, Sanderson MR; 
Nat Struct Biol 1995;2:876-881. 
Number of members: 65 



642. Nuclear transition protein 2 signatures (TP2) 

In mammals, the second stage of spermatogenesis is characterized by the conversion of 
nucleosomal chromatin to the compact, non-nucleosomal and transcriptionally inactive form 
found in the sperm nucleus. This condensation is associated with a double-protein transition. 
The first transition corresponds to the replacement of histones by several spermatid-specific 
proteins, also called transition proteins, which are themselves replaced by protamines during 
the second transition. Nuclear transition protein 2 (TP2) is one of those spermatid-specific 
proteins. TP2 is a basic, zinc-binding protein [1] of 1 16 to 137 amino-acid residues. 
Structurally, TP2 consists of three distinct parts: a conserved serine-rich N-terminal domain 
of about 25 residues, a variable central domain of 20 to 50 residues which contains cysteine 



Reference No. 2750-942P 



539 

residues, and a conserved C-terminal domain of about 70 residues rich in lysines and 
arginines. Two signature patterns for TP2 have been developed: one located in the N-terminal 
domain, the other in the C-terminal. 
Consensus pattern: H-x(3)-H-S-[NS]-S-x-P-Q-S 
Consensus pattern: K-x-R-K-x(2)-E-G-K-x(2)-K-[KR]-K 

[ 1] Baskaran R., Rao M.R.S. Biochem. Biophys. Res. Commun. 179:1491-1499(1991). 
643. Thiamine pyrophosphate enzymes signature (TTP enzymes) 

A number of enzymes require thiamine pyrophosphate (TPP) (vitamin Bl) as a cofactor. It 
has been shown [1] that some of these enzymes are structurally related. These related TPP 
enzymes are: - Pyruvate oxidase (POX) (EC 1.2.3.3 ) Reaction catalyzed: pyruvate + 
orthophosphate + 0(2) + H(2)0 = acetyl phosphate + CO(2) + H(2)0(2). - Pyruvate 
decarboxylase (PDC) (EC 4.1.1.1 ) Reaction catalyzed: pyruvate = acetaldehyde + CO(2). - 
Indolepyruvate decarboxylase (EC 4.1.1.74 ) [2] Reaction catalyzed: indole-3 -pyruvate = 
indole-3 -acetaldehyde + CO(2). - Acetolactate synthase (ALS) (EC 4.1.3.18 ) Reaction 
catalyzed: 2 pyruvate = acetolactate + CO(2). - Benzoylformate decarboxylase (BFD) (EC 
4.1.1.7) [3] Reaction catalyzed: benzoylformate = benzaldehyde + CO(2). A conserved 
region which is located in their C-terminal section has been selected as a signature pattern for 
these enzymes. 

Consensus pattern: fWVMl^IUV^ 

[GSA]- [G8A€}[GSAC SEQJD.NO:93ij. 

[ 1] Green J.B.A. FEBS Lett. 246:1-5(1989).[ 2] Koga J., Adachi T. ? Hidaka H. Mol. Gen. 
Genet. 226:10-16(1991).[ 3] Tsou A.Y., Ransom S.C., Gerlt J.A., Buechter D.D., Babbitt 
P.C., Kenyon G.L. Biochemistry 29:9856-9862(1990). 

644. TPR Domain 
[1] 

Medline: 95397415 

Tetratrico peptide repeat interactions: to TPR or not to TPR? 
Lamb JR, Tugendreich S, Hieter P; 
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Trends Biochem Sci 1995;20:257-259. 
[2]Medline: 98151343 
The structure of the tetratricopeptide repeats of protein 
phosphatase 5: implications for TPR-mediated protein-protein 
interactions. 
Das AK, Cohen PW, Barford D; 
EMBO J 1998;17:1192-1199. 
Number of members: 621 

645. Uroporphyrin-III C-methyltransferase signatures (TP methylase) 

Uroporphyrin-III C-methyltransferase (EC 2.1.1.107 ) (SUMT) [1,2] catalyzes the transfer of 
two methyl groups from S-adenosyl-L-methionine to the C-2 and C-7atoms of 
uroporphyrinogen III to yield precorrin-2 via the intermediate formation of precorrin-1. 
SUMT is the first enzyme specific to the cobalamin pathway and precorrin-2 is a common 
intermediate in the biosynthesis of corrinoids such as vitamin B12, siroheme and coenzyme 
F430.The sequences of SUMT from a variety of eubacterial and archaebacterial species are 
currently available. In species such as Bacillus megaterium (gene cobA), Pseudomonas 
denitrificans (cobA) or Methanobacterium ivanovii (gene corA) SUMT is a protein of about 
25 to 30 Kd. In Escherichia coli and related bacteria, the cysG protein, which is involved in 
the biosynthesis of siroheme, is a multifunctional protein composed of a N-terminal domain, 
probably involved in transforming precorrin-2 into siroheme, and a C-terminal domain which 
has SUMT activity. The sequence of SUMT is related to that of a number of P. denitrificans 
and Salmonella typhimurium enzymes involved in the biosynthesis of cobalamin which also 
seem to be SAM-dependent methyltransferases [3,4]. The similarity is especially strong with 
two of these enzymes: cobl/cbiL which encodes S-adenosyl-L-methionine-precorrin-2 
methyltransferase and cobM/cbiF whose exact function is not known. Two signature patterns 
have been developed for these enzymes. The first corresponds to a well conserved region in 
the N-terminal extremity (called region 1 in [1,3]) and the second to a less conserved region 
located in the central part of these proteins (this pattern spans what are called regions 2 and 3 
in [1,3]). 
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Consensus pattern: P^A^}[L1VM 

G-P-G-x(3)-fUVMffl j T JVMFY SEP ID NO: i 8 s] 4UVMH LIVM SEP ID NQ:4Y1 -T- 
ji^M^U^ f^RH^ jKRHQG SEP ID N O:592 ij-[AG] 

Consensus pattern: V-x(2)-[LI]-x(2)-G-D-x(3)-[FYW]-[GS]-x(8)-{fcl¥F]{LIVF SEQ.IO 
NQ:127)j-x(5 ? 6)- t^-VMf^-WP^ 
ft fVM [ U VMY SEQ. j DN 0;Jii)l-x-P-G 

[ 1] Blanche F., Robin C, Couder M, Faucher D., Cauchois L., Cameron B. 5 Crouzet J. J. 
Bacteriol. 173:4637-4645(1991).[ 2] Robin C, Blanche F., Cauchois L., Cameron B. 5 Couder 
M. ? Crouzet J. J. Bacteriol. 173:4893-4896(1991).[ 3] Crouzet J., Cameron B., Cauchois L., 
Rigault S., Rouyez M.-C, Blanche F. ? Thibaut D., Debussche L. J. Bacteriol. 172:5980- 
5990(1 990).[ 4] Roth J.R., Lawrence J.G., Rubenfield M. ? Kieffer-Higgins S., Church G.M. J. 
Bacteriol. 175:3303-3316(1993).[ 5] Mattheakis L.C., Shen W.H., Collier R.J. Mol. Cell. 
Biol. 12:4026-4037(1992). 

646. Tudor domain 

Domain of unknown function present in several RNA-binding proteins, copies in the 
Drosophila Tudor protein. Slight ambiguities in the alignment.Number of members: 1 8 
[IJMedline: 97200561 Tudor domains in proteins that interact with RNA. Ponting CP; 
Trends Biochem Sci 1997;22:51-52. [2]Medline: 97157029 The human EBNA-2 
coactivator pi 00: multidomain organization and relationship to the staphylococcal nuclease 
fold and to the tudor protein involved in Drosophila melanogaster development. Callebaut I, 
Mornon JP; Biochem J 1997;321:125-132. 

647. Terpene synthase family 

It has been suggested that this gene family be designated 
tps (for terpene synthase) [1]. It has been split into six 
subgroups on the basis of phylogeny, called tpsa-tpsf. 
tpsa includes vetispiridiene synthase Swiss:Q3 9 9 79 5 5-epi- 
aristolochene synthase, Swiss:Q40577 and (+)-delta-cadinene 
synthase Swiss:P93665. 

tpsb includes (-)-limonene synthase, Swiss:Q40322. 



Reference No. 2750-942P 



542 

tpsc includes kaurene synthase A, Swiss:O04408. 
tpsd includes taxadiene synthase, Swiss:Q41594, pinene synthase, 
Swiss:024475 and myrcene synthase, Swiss:024474. 
tpse includes kaurene synthase B. 
tpsf includes linalool synthase. 
Number of members: 5 1 
[1] 

Medline: 97413772 

Monoterpene synthases from grand fir (Abies grandis). cDNA 
isolation, characterization, and functional expression of 
myrcene synthase, (-)-(4S)-limonene synthase, and 
(-)-(lS,5S)-pinene synthase. 
Bohlmann J, Steele CL, Croteau R; 
J Biol Chem 1997;272:21784-21792. 

648. ThiF family 

This family contains a repeated domain in ubiquitin 
activating enzyme El and members of the bacterial 
ThiF/MoeB/HesA family. Number of members: 87 

649. Thioester dehydrase 

Members of this family are involved in fatty acid biosynthesis. 
Number of members: 19 
[1] 

Medline: 96398612 

Structure of a dehydratase-isomerase from the bacterial 
pathway for biosynthesis of unsaturated fatty acids: two 
catalytic activities in one active site. 
Leesong M, Henderson BS, Gillig JR, Schwab JM, Smith JL; 
Structure 1996;4:253-264. 
Database Reference: SCOP; lmka; fa; [SCOP-USA] [CATH-PDBSUM] 
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Database reference: PFAMB; PB058036; 
650. Tub family signatures 

The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and 
sensory deficits. This mutation maps to a gene, tub [l,2],which codes for a protein that 
belongs to a family which currently consists of the following members: - Mammalian tub, an 
hydrophilic protein of about 500 residues, which could be involved in the hypothalamic 
regulation of body weight. - Human protein TULP1 [3] which may be involved in retinis 
pigmentosa 14, a retinal degeneration disease. - Mouse protein p4-6 whose function is not 
known. - Caenorhabditis elegans hypothetical protein F10B5.4. - Several fragmentary 
sequences from plants, Drosophila and human ESTs. While the N-terminal part of these 
protein is not conserved in length nor in the sequence, the C-terminal 250 residues are highly 
conserved. Therefore, two regions were selected in the C-terminal part as signature patterns. 
The secondr egion is located at the C-terminal extremity and contains a penultimate cysteine 
residue that could be critical to the normal functioning of these proteins. 
Consensus pattern: F-[KHQ]-G-R-V-[ST]-x-A-S-V-K-N-F-Q 

Consensus pattern: A-F-rAG]-I4SACl-fi^V^ HLiV?vi SEP ID NO;4 \ ]-[ST]-S-F-x-[GST]-K- 
x-A-C-E 

[ 1] Kleyn P.W., Fan W., Kovats S.G., Lee J.L., Pulido J.C., Wu Y., Berkemeier L.R., 
Misumi D.J., Holmgren L., Charlat O., Woolf E.A., Tayber O., Brody T. 9 Shu P., Hawkins F., 
Kennedy B., Baldini L. ? Ebeling C, Alperin G.D., Deeds J. ? Lakey N.D., Culpepper J., Chen 
H. ? Gluecksmann-Kuis M.A., Carlson G.A., Duyk G.M., Moore K.J. Cell 85:281-2900 996V [ 
2] Noben-Trauth K., Naggert J.K., North M.A., Nishina P.M. Nature 380:534-538(1 996).[ 3] 
North M.A., Naggert J.K., Yan Y. ? Noben-Trauth K., Nishina P.M. Proc. Natl. Acad. Sci. 
U.S.A. 94:3128-3133(1997). 

651. Eukaryotic DNA topoisomerase I active site 

DNA topoisomerase I (EC 5.99.1.2 ) [l,2,3,4,Ei] is one of the two types of enzyme that 
catalyze the interconversion of topological DNA isomers. Type Itopoisomerases act by 
catalyzing the transient breakage of DNA, one strand at a time, and the subsequent rejoining 
of the strands. When a eukaryotic type 1 topoisomerase breaks a DNA backbone bond, it 
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simultaneously forms a protein-DNA link where the hydroxyl group of a tyrosine residue is 

joined to a 3 '-phosphate on DNA, at one end of the enzyme-severed DNA strand. In 

eukaryotes and pox virus topoisomerases I, there are a number of conserved residues in the 

region around the active site tyrosine. 

Consensus pattern: [DEN]-x(6)-[GS]-[IT]-S-K-x(2)-^ 

^¥M4tUVM. SEP ID NO:4)] [Y is the active site tyrosine] 

[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1 990).[ 2] Sharma A., Mondragon A. 
Curr. Opin. Struct. Biol. 5:39-47(1995).[ 3] Lynn R.M., Bjornsti M.-A., Caron P.R., Wang 
J.C. Proc. Natl. Acad. Sci. U.S.A. 86:3559-3563(1989).[ 4] Roca J. Trends Biochem. Sci. 
20:156-160(1995).[E1] 

652. Transaldolase signatures 

Transaldolase (EC 2.2.1.2 ) catalyzes the reversible transfer of a three-carbonketol unit from 
sedoheptulose 7-phosphate to glyceraldehyde 3 -phosphate to form erythrose 4-phosphate and 
fructose 6-phosphate. This enzyme, together with transketolase, provides a link between the 
glycolytic and pentose-phosphate pathways. Transaldolase is an enzyme of about 34 Kd 
whose sequence has been well conserved throughout evolution. A lysine has been implicated 
[l]in the catalytic mechanism of the enzyme; it acts as a nucleophilic group that attacks the 
carbonyl group of fructose-6-phosphate.Transaldolase is evolutionary related [2] to a 
bacterial protein of about 20Kd (known as talC in Escherichia coli), whose exact function is 
not yet known. Two signature patterns have been developed for these proteins. The first, 
located in the N-terminal section, contains a perfectly conserved pentapeptide; these cond, 
includes the active site lysine. 
Consensus pattern: [DG]-|4V8A|[1VSA.SE^ 
tf^¥MR{IJVMF SEP fD NO:2>] (2) 

Consensus pattern: f-klV-M^JVN ,4 ) 

NO;195]]-G- |WA4]{U 

MO:5%)I-x-fUVM4! I.,IVM SEQ ID NO:4V[ [K is the active site residue] 

[ 1] Miosga T., Schaaff-Gerstenschlaeger I., Franken E., Zimmermann F.K. Yeast 9:1241- 

1249(1993).[ 2] Reizer J., Reizer A., Saier M.H. Jr. Microbiology 141:961-971(1995). 



Reference No. 2750-942P 

545 

653. (Transpeptidase) Penicillin binding protein transpeptidase domain 

The active site serine (residue 337 in Swi ss:P 1467 7 ) is conserved in all members of this 
5 family. 

[1] Pares S, Mouz N, Petillot Y, Hakenbeck R, Dideberg O Nat Struct Biol 1996;3:284-289. 

1 0 654. Trehalase signatures 

Trehalase (EC 3.2.1.28 ) is the enzyme responsible for the degradation of the disaccharide 
alpha, alpha-trehalose yielding two glucose subunits [1]. It is an enzyme found in a wide 
variety of organisms and whose sequence has been highly conserved throughout evolution. 
Two of the most highly conserved regions have been selected as signature patterns. The first 

15 pattern is located in the central section, the second one is in the C-terminal region. 
Consensus pattern: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y 
Consensus pattern: Q-W-D-x-P-x-[GA]-W-[PAS]-P 

[ 1] Kopp M., Mueller H., Holzer H. J. Biol. Chem. 268:4766-4774(1 993).[ 2] Henrissat B., 
Bairoch A. Biochem. J. 293:781-788(1993).[E1] 

20 

655. Trehalose-6-phosphate synthase domain 

OtsA (Trehalose-6-phosphate synthase) is homologous to regions 
in the subunits of yeast trehalose-6-phosphate synthase/phosphate complex, [1]. 
25 [1] Kaasen I, McDougall J, Strom AR; Gene 1994;145:9-15. 

656. Tropomyosins signature 

Tropomyosins [1,2] are family of closely related proteins present in muscle and non-muscle 
3 0 cells. In striated muscle, tropomyosin mediate the interactions between the troponin complex 
and actin so as to regulate muscle contraction. The role of tropomyosin in smooth muscle and 
non-muscle tissues is not clear. Tropomyosin is an alpha-helical protein that forms a coiled- 
coil dimer. Muscle isoforms of tropomyosin are characterized by having 284 amino acid 
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residues and a highly conserved N-terminal region, whereas non-muscle forms are generally 
smaller and are heterogeneous in their N-terminal region. The signature pattern for 
tropomyosins is based on a very conserved region in the C-terminal section of tropomyosins 
and which is present in both muscle and non-muscle forms. 
5 Consensus pattern: L-K-E-A-E-x-R-A-E 

[ 1] Smilie L.B. Trends Biochem. Sci. 4: 1 5 1 -1 55(1 979).[ 2] McLeod A.R. BioEssays 6:208- 
212(1986). 



10 657. Troponin 

Troponin (Tn) contains three subunits, Ca2+ binding (TnC), 
inhibitory (Tnl), and tropomyosin binding (TnT). this Pfam contains 
members of the TnT subunit. 

Troponin is a complex of three proteins, Ca2+ binding (TnC), 
1 5 inhibitory (Tnl), and tropomyosin binding (TnT). 

The troponin complex regulates Ca++ induced muscle contraction. 
This family includes troponin T and troponin I. Troponin I 
binds to actin and troponin T binds to tropomyosin. 
Number of members: 81 [1] 
2 0 Medline: 87144593 

Structure of co-crystals of tropomyosin and troponin. 
White SP, Cohen C, Phillips GN Jr; 
Nature 1987;325:826-828. [2] 
Medline: 95155315 

2 5 A direct regulatory role for troponin T and a dual role for 

troponin C in the Ca2+ regulation of muscle contraction. 
Potter JD, Sheng Z, Pan BS, Zhao J; 
J Biol Chem 1995;270:2557-2562. 
[3]Medline: 95324796 

3 0 The troponin complex and regulation of muscle contraction. 

Farah CS, Reinach FC; 
FASEB J 1995;9:755-767. 
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658. (Tryp mucin) Mucin-like glycoprotein 

This family of trypanosomal proteins resemble vertebrate mucins. The protein consists of 
5 three regions. The N and C terminii are conserved between all members of the family, 

whereas the central region is not well conserved and contains a large number of threonine 
residues which can be glycosylated [1]. 

Indirect evidence suggested that these genes might encode the core protein of parasite 
mucins, glycoproteins that were proposed to be involved in the interaction with, and invasion 
1 0 of, mammalian host cells. 

[1] Di Noia JM, Sanchez DO, Frasch AC; J Biol Chem 1995;270:24146-24149. 

[2] Di Noia JM, D'Orso I, Aslund L, Sanchez DO, Frasch AC; J Biol Chem 1998;273:10843- 

10850. 

15 

659. Aminoacyl-transfer RNA synthetases class-I signature (tRNA synt 1) 
Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 

2 0 prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each differentamino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse interms of subunit size and of quaternary structure. A few years ago it was found [2] 

2 5 that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-Ile-Gly-His ('HIGH') is very well 
conserved. The 'HIGH' region has been shown [3] to be part of the adenylate binding site. 
The 'HIGH' signature has been found in the aminoacyl-tRNA synthetases specific for 
arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, 

30 tryptophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-I 

synthetases [4,5,6] and seem to share the same tertiary structure based on a Rossmann fold. 
Consensus pattern: P-x(0,2)-fGS4^][GSi;AN..S 
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[HTH«VM¥AG][UVMYAC 

f'irVMFYSTAGP G4 fLIVMFYSTAGP€ SEP ID NQ:60 iyi 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Webster T., Tsai H., Kula M., 
Mackie G.A., Schimmel P. Science 226: 13 1 5-1 3 1 7(1984).[ 3] Brick P., Bhat T.N., Blow 
5 D.M. J. Mol. Biol. 208:83-98(1988).[ 4] Delarue M., Moras D. BioEssays 15:675- 

687(1993).[ 5] Schimmel P. Trends Biochem. Sci. 1 6: 1 -3(1 991 ).[ 6] Nagel G.M., Doolittle 
R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 



1 0 660. Aminoacyl-transfer RNA synthetases class-I signature (tRNA synt 1 b) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 

1 5 aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse in terms of subunit size and of quaternary structure. A few years ago it was found [2] 
that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-Ile-Gly-His ('HIGH') is very well 

2 0 conserved. The 'HIGH' region has been shown [3] to be part of the adenylate binding site. 
The 'HIGH' signature has been found in the aminoacyl-tRNA synthetases specific 
forarginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, 
tryptophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-I 
synthetases [4,5,6] and seem to share the same tertiary structure based on a Rossmann fold. 

2 5 Consensus pattern: P-x(0.2)-tQSTAN^[GSTAN SEP ID NO:296)]- 

fBBN0GAPK}IDENQGAPK SEP ID NO:597)) -x-ffeP/MPP| jUVMFP SEP ID N0:598Y1 - 
[HT14W-M¥A€-tiLI VMY AC SEP ID NP:599)]-G- fHNTOftJ^ 
[LIVMFYSTAGPC 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Webster T., Tsai H., Kula M., 
30 Mackie G.A., Schimmel P. Science 226: 13 15-1 3 17(1984).[ 3] Brick P., Bhat T.N., Blow 
D.M. J. Mol. Biol. 208:83-98(1988).[ 4] Delarue M., Moras D. BioEssays 15:675- 
687(1993).[ 5] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 6] Nagel G.M., Doolittle 
R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
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661 . (tRNA-synt 1C) tRNA synthetases class I (E and Q) 

Other tRNA synthetase sub-families are too dissimilar to be included. 

This family includes only glutamyl and glutaminyl tRNA synthetases. 

In some organisms, a single glutamyl-tRNA synthetase aminoacylates both tRNA(Glu) and 

tRNA(Gln). 

[1] Rath VL, Silvian LF, Beijer B, Sproat BS, Steitz TA; Structure 1998;6:439-449. 

662. (tRNA-synt Id) tRNA synthetases class I (R) 

Other tRNA synthetase sub-families are too dissimilar to be included. 
This family includes only arginyl tRNA synthetase. 

663. Aminoacyl-transfer RNA synthetases class-II signatures (tRNA synt 2) 
Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse interms of subunit size and of quaternary structure. The synthetases specific for 
alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-II synthetases [2 to 6] and probably have a common 
folding pattern in their catalytic domain for the binding of ATP and amino acid which is 
different to the Rossmann fold observed for the class I synthetases [7]. Class-II tRNA 
synthetases do not share a high degree of similarity, however at least three conserved regions 
are present [2,5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE 
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Consensus pattern: [GSTALW]{GSXA^ 

SE Q ID NO:43ji-fGSTA4rGSTA SEP ID NO: 19V|-IWMF|| LIVMF SEC) ID NO:2Y) -[DR]- 
R-B^VMRJjjyMF SEP I'D NO:2)j-x- ^VMS^^HiLJVMSTAG SEP iD.NQ:441j- 
P-^MF¥]{LIVMZY..MQ.ID NO:18)J 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Delarue M, Moras D. 
BioEssays 15:675-687(1993).[ 3] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 4] Nagel 
G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). [ 5] Cusack S., 
Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991).[ 6] Cusack S. 
Biochimie 75 : 1077- 108 1(1 993). [ 7] Cusack S., Berthet-Colominas C., Haertlein M., Nassar 
N., Leberman R. Nature 347:249-255(1990).[ 8] Leveque F., Plateau P., Dessen P., Blanquet 
S. Nucleic Acids Res. 18:305-312(1990). 

664. Aminoacyl-transfer RNA synthetases class-I signature (tRNA synt le) 
Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse in terms of subunit size and of quaternary structure. A few years ago it was found [2] 
that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-Ile-Gly-His ('HIGH') is very well 
conserved. The 'HIGH' region has been shown [3] to be part of the adenylate binding site. 
The 'HIGH' signature has been found in the aminoacyl-tRNA synthetases specific 
forarginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, 
tryptophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-I 
synthetases [4,5,6] and seem to share the same tertiary structure based on a Rossmann fold. 
Consensus pattern: P-x(0,2)-fGSTAN^ 

j-D.^N-QGAPK-1-i'DENQ GA.PK SEP i.D N 0 : 5 9 7 Y) -x- fefiEMFPif L [V M FP SEP ID NO:598 )|. 

[HT1-ffe^M¥A€irLIVMYAC SEQ ID NO:599)l-G- fHN : FG}[H^ 

[LIVMFYSTAGPC 
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[ 1] Schimmel P. Annu. Rev. Biochem. 56: 125-1 58(1 987).[ 2] Webster T., Tsai H., Kula M., 
Mackie G.A., Schimmel P. Science 226: 13 15-131 7(1984). [ 3] Brick P., Bhat T.N., Blow 
D.M J. Mol. Biol. 208:83-98(1988).[ 4] Delarue M, Moras D. BioEssays 15:675- 
687(1993).[ 5] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 6] Nagel G.M., Doolittle 
R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 

665. Aminoacyl-transfer RNA synthetases class-11 signatures (tRNA synt 2b) 
Aminoacyl-tRNA synthetases (EC 6.1.1 .-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse interms of subunit size and of quaternary structure. The synthetases specific for 
alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-II synthetases [2 to 6] and probably have a common 
folding pattern in their catalytic domain for the binding of ATP and amino acid which is 
different to the Rossmann fold observed for the class I synthetases [7]. Class-II tRNA 
synthetases do not share a high degree of similarity, however at least three conserved regions 
are present [2,5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE 
Consensus pattern: f;GS-PAivV4H(GS 

R-fI:4¥MF]|IJVMF SEQID.NO:2)]-x- {ilAqviSm^ 
I^^Mffl JLiVKfFY SEP ID NO: 18)1 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Delarue M. ? Moras D. 
BioEssays 15:675-687(1993).[ 3] Schimmel P. Trends Biochem. Sci. 1 6: 1-3(1 99 1 ).[ 4] Nagel 
G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). [ 5] Cusack S. ? 
Haertlein M, Leberman R. Nucleic Acids Res. 19:3489-3498(1991).[ 6] Cusack S. 
Biochimie 75:1077-1081(1993).[ 7] Cusack S., Berthet-Colominas C, Haertlein M, Nassar 
N., Leberman R. Nature 347:249-255(1 990).[ 8] Leveque F., Plateau P., Dessen P., Blanquet 
S. Nucleic Acids Res. 18:305-312(1990). 
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666. Thaumatin family signature 

Thaumatin [1] is an intensively sweet- tasting protein (100 000 times sweeter than sucrose on 
5 a molar basis) from Thaumatococcus daniellii, an African brush. The protein is made of about 
200 residues and contains 8 disulfide bonds. A number of proteins have been found to be 
related to thaumatins. These protein are listed below (references are only provided for 
recently determined sequences). - A maize alpha-amylase/trypsin inhibitor. - Two tobacco 
pathogenesis-related proteins: PR-R major and minor forms, which are induced after 

1 0 infection with viruses. - Salt-induced protein NP24 from tomato. - Osmotin, a salt-induced 

protein from tobacco. - Osmotin-like proteins OSML13, OSML15 and OSML81 from potato 
[2]. - P21 , a leaf protein from soybean. - PWIR2, a leaf protein from wheat. - Zeamatin, a 
maize antifunal protein [3]. The exact biological function of all these proteins is not yet 
known. A conserved region that includes three cysteine residues known (in thaumatin) to be 

15 involved in disulfide bonds has been selected as a signature pattern. 

+ 4_ | + 4. | | ******* | 

II 

xxCxxxxxxxxxxxxxxxxCxxCxxCxCxxxxxxxxxxxxxxCxxCxCxxxCxCxxCCxCxxxCxxxxxC 
xxxCx | | | I I I I I II I I +-+ +-+ I +— + +--++-+ | + +'C: conserved cysteine 

2 0 involved in a disulfide bond.'*': position of the pattern. 

Consensus pattern: G-x-[GF]-x-C-x-T-[GA]-D-C-x(l,2)-G-x(2,3)-C 
[ 1] Edens L. 5 Heslinga L., Klok R., Ledeboer A.M., Maat J., Toonen M.Y., Visser C, 
Verrips C.T. Gene 18:1-12(1982).[ 2] Zhu B., Chen T.H.H., Li P.H. Plant Physiol. 108:929- 
937(1 995).[ 3] Malehorn D.E., Borgmeyer J.R., Smith C.E., Shah D.M; Plant Physiol. 

25 106:1471-1481(1994). 

667. Thiolases signatures 

Two different types of thiolase [1,2,3] are found both in eukaryotes and in prokaryotes: 
30 acetoacetyl-CoA thiolase (EC 2.3.1.9 ) and 3-ketoacyl-CoA thiolase(EC 2.3.1.16 ). 3-ketoacyl- 
CoA thiolase (also called thiolase I) has a broad chain-length specificity for its substrates and 
is involved in degradative pathways such as fatty acid beta-oxidation. Acetoacetyl-CoA 
thiolase (also called thiolase II) is specific for the thiolysis of acetoacetyl-CoA and involved 
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in biosynthetic pathways such as poly beta-hydroxybutyrate synthesisor steroid biogenesis. In 
eukaryotes, there are two forms of 3-ketoacyl-CoA thiolase: one located in the mitochondrion 
and the other in peroxisomes. There are two conserved cysteine residues important for 
thiolase activity. The first located in the N-terminal section of the enzymes is involved in the 
5 formation of an acyl-enzyme intermediate; the second located at the C-terminal extremity is 
the active site base involved in deprotonation in the condensation reaction. Mammalian 
nonspecific lipid-transfer protein (nsL-TP) (also known as sterol carrier protein 2) is a protein 
which seems to exist in two different forms: a 14 Kd protein (SCP-2) and a larger 58 Kd 
protein (SCP-x). The former is found in the cytoplasm or the mitochondria and is involved in 

1 0 lipid transport; the latter is found in peroxisomes. The C-terminal part of SCP-x is identical to 
SCP-2 while the N-terminal portion is evolutionary related to thiolases[4]. Three signature 
patterns have been developed for this family of proteins, two of which are based on the 
regions around the biologically important cysteines. The third is based on a highly conserved 
region in the C-terminal part of these proteins. 

1 5 Consensus pattern: fLf^M)j'L)VM SEP ID NO:4 )|-rNST]-x(2)-C-f-SAGfct)[SAGLi SEP ID 

fS ; i : AGj[STAG 

SEQJ]^ [C is 

involved in formation of acyl-enzyme intermediate] 

Consensus pattern: N-x(2)-G-G-x- vMVM : \{lAyM..SMQJD,MQ^ - [S A] -x-G-H-P-x- [G A] -x- 
2 0 [ST]-G 

Consensus pattern: [AG]- [ L1VM A ] 0JLYMA,SO 

^\G€4^M^:STAGCLTVN4 SEP I D NO:604 ij-fST^VgK STAG SEP ID N O:2Q)l- 
fEtVMA}[LlVMA SEP ID NO:30)j-C-x-[AG]-x-[AG]-x- [AG] -x- [SAG] [C is the active site 
residue] 

25 [ 1] Peoples P.P., Sinskey A.J. J. Biol. Chem. 264:15293-1 5297(1989).[ 2] Yang S.-Y. 5 Yang 
X.-Y.H., Healy-Louie G., Schulz H. ? Elzinga M. J. Biol. Chem. 265:10424-10429(1990).[ 3] 
Igual J.C., Gonzalez-Bosch C, Dopazo J. 5 Perez-Grtin J.E. J. Mol. Evol. 35:147-155(1992).[ 
4] Baker M.E., Billheimer J.T., Strauss J.F. Ill DNA Cell Biol. 10:695-698(1991). 

30 

668. Thioredoxin family active site 

Thioredoxins [1 to 4] are small proteins of approximately one hundred amino-acid residues 
which participate in various redox reactions via the reversible oxidation of an active center 
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disulfide bond. They exist in either a reduced form or an oxidized form where the two 
cysteine residues are linked in an intramolecular disulfide bond. Thioredoxin is present in 
prokaryotes and eukaryotes and the sequence around the redox-active disulfide bond is 
wellconserved. Bacteriophage T4 also encodes for a thioredoxin but its primary structure is 
not homologous to bacterial, plant and vertebrate thioredoxins. A number of eukaryotic 
proteins contain domains evolutionary related tothioredoxin, all of them seem to be protein 
disulphide isomerases (PDI). PDI(EC 5.3.4.1 ) [5,6,7] is an endoplasmic reticulum enzyme 
that catalyzes the rearrangement of disulfide bonds in various proteins. The various forms of 
PDI which are currently known are: - PDI major isozyme; a multifunctional protein that also 
function as the beta subunit of prolyl 4-hydroxylase (EC 1.14.11.2 ), as a component of 
oligosaccharyl transferase (EC 2.4.1.119 ), as thyroxine deiodinase (EC 3.8. 1.4), as 
glutathione-insulin transhydrogenase (EC 1.8.4.2 ) and as a thyroid hormone-binding protein ! 
- ERp60 (ER-60; 58 Kd microsomal protein). ERp60 was originally thought to be a 
phosphoinositide-specific phospholipase C isozyme and later to be a protease. - ERp72. - 
P5.A11 PDI contains two or three (ERp72) copies of the thioredoxin domain. Bacterial 
proteins that act as thiokdisulfide interchange proteins thatallows disulfide bond formation in 
some periplasmic proteins also contain a thioredoxin domain. These proteins are: - 
Escherichia coli dsbA (or prfA) and its orthologs in Vibrio cholerae (tcpG) and Haemophilus 
influenzae (por). - Escherichia coli dsbC (or xpRA) and its orthologs in Erwinia chrysanthemi 
and Haemophilus influenzae. - Escherichia coli dsbD (or dipZ) and its Haemophilus 
influenzae ortholog. - Escherichia coli dsbE (or ccmG) and orthologs in Haemophilus 
influenzae, Rhodobacter capsulatus (helX), Rhiziobiacae (cycY and tlpA). 
Consensus pattern : [LIVK4I^ 

NQ-fiQ^ [eATPLVE][GAIPLVh.SEQ 
ID NO:607 )1 -j^^WSTAj[ PHYWSTA SEP ID NO:608 »] -C-x(6)- 
i'M¥MF¥WT41 LI VMFYW1' SEP ID NO:47)] [The two C's form the redox-active bond] 
[ 1] Holmgren A. Annu. Rev. Biochem. 54:237-271(1985).[ 2] Gleason F.K., Holmgren A. 
FEMS Microbiol. Rev. 54:271-297(1988).[ 3] Holmgren A. J. Biol. Chem. 264:13963- 
13966(1989).[ 4] Eklund H., Gleason F.K., Holmgren A. Proteins 1 1 :13-28(1991).[ 5] 
Freedman R.B., Hawkins H.C., Murant S.J., Reid L. Biochem. Soc. Trans. 1 6:96-99(1 988).[ 
6] Kivirikko K.I., Myllyla R., Pihlajaniemi T. FASEB J. 3:1609-1617(1989).[ 7] Freedman 
R.B., Hirst T.R., Tuite M.F. Trends Biochem. Sci. 19:331-336(1994). 
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669. (Transcript fac2) Transcription factor TFIIB repeat signature 

In eukaryotes the initiation of transcription of protein encoding genes by polymerase II is 
5 modulated by general and specific transcription factors. The general transcription factors 
operate through common promoters elements (such as the TATA box). At least seven 
different proteins associates to form the general transcription factors: TFIIA, -IIB, -IID, -IIE, 
-IIF, -IIG, and -IIH[1]. Transcription factor IIB (TFIIB) plays a central role in the 
transcription of class II genes, it associates with a complex of TFIID-IIA bound to DNA (DA 

1 0 complex) to form a ternary complex TFIID-IIA-IBB (DAB complex) which is then 

recognized by RNA polymerase II [2,3]. TFIIB is a protein of about 3 15 to 340amino acid 
residues which contains, in its C -terminal part an imperfect repeat of a domain of about 75 
residues. This repeat could contribute an element of symmetry to the folded protein. The 
following proteins have been shown to be evolutionary related to TFIIB: - An archaebacterial 

15 TFIIB homolog. In Pyrococcus woesei a previously undetected open reading frame has been 
shown [4] to be highly related to TFIIB. - Fungal transcription factor IIIB 70 Kd subunit 
(gene PCF4/TDS4/BRF 1 ) [5]. This protein is a general activator of RNA polymerase III 
transcription and plays a role analogous to that of TFIIB in pol III transcription. The central 
section of the repeated domain, which is the most conserved part of that domain has been 

2 0 selected as a signature pattern. 

Consensus pattern: G- fKR] -x(3 H S TAG N 1 j' STA GN S EP ID : NO:24 Vj-x- 
H^Mm¥ LIVMYA SEP ID NO:609 1]-fr6g&\tf GSTA SEP ID NO:l 9V K2V 

25 [ 1] Weinmann R. Gene Expr. 2:81-91(1992). [ 2] Hawley D. Trends Biochem. Sci. 16:317- 
318(1991).[ 3] Ha I., Lane W.S., Reinberg D. Nature 352:689-695(1991).[ 4] Guzounis C., 
Sander C. Cell 71 :1 89-1900 992U 5] Khoo B. 9 Brophy B., Jackson S.P. Genes Dev. 8:2879- 
2890(1994). 

30 

670. (transcritp fact) MADS-box domain signature and profile 

A number of transcription factors contain a conserved domain of 56 amino-acid residues, 
sometimes known as the MADS-box domain [El]. They are listed below: - Serum response 
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factor (SRF) [1], a mammalian transcription factor that binds to the Serum Response Element 
(SRE). This is a short sequence of dyad symmetry located 300 bp to the 5' end of the 
transcription initiation site of genes such as c-fos. - Mammalian myocyte-specific enhancer 
factors 2A to 2D (MEF2A to MEF2D). These proteins are transcription factor which binds 
specifically to the MEF2 element present in the regulatory regions of many muscle-specific 
genes. - Drosophila myocyte-specific enhancer factor 2 (MEF2). - Yeast GRM/PRTF protein 
(gene MCM1) [2], a transcriptional regulator of mating-type-specific genes. - Yeast arginine 
metabolism regulation protein I (gene ARGR1 or ARG80). - Yeast transcription factor 
RLML - Yeast transcription factor SMP1 . - Arabidopsis thaliana agamous protein (AG) [3], a 
probable transcription factor involved in regulating genes that determines stamen and carpel 
development in wild-type flowers. Mutations in the AG gene result in the replacement of the 
stamens by petals and the carpels by a new flower. - Arabidopsis thaliana homeotic proteins 
Apetalal (API), Apetala3 (AP3) and Pistillate (PI) which act locally to specify the identity of 
the floral meristem and to determine sepal and petal development [4]. - Antirrhinum majus 
and tobacco homeotic protein deficiens (DEFA) and globosa (GLO) [5]. Both proteins are 
transcription factors involved in the genetic control of flower development. Mutations in 
DEFA or GLO cause the transformation of petals into sepals and of stamina into carpels. - 
Arabidopsis thaliana putative transcription factors AGL1 to AGL6 [6]. - Antirrhinum majus 
morphogenetic protein DEF H33 (squamosa).In SRF, the conserved domain has been shown 
[1] to be involved in DNA-binding and dimerization. A pattern that spans the complete length 
of the domain has been derived. The profile also spans the length of the MADS-box. 
Consensus pattern: R-x-fRK1-x(5H-x-fDNGSK4 fDNGSK SEP IP NO:6IOl] -x(3)-[KR]- 
x(2)-T-[FY]-x-[RK](3)- x(2)4WMtfU VM SEP ID NO:4)l -x-Kf2VA-x-E4^l\^jQ; WM 

NO:4)I(3)-x(6)4I4^MF-}[LIVMF.SEQ..ro 

[ 1] Norman C, Runswick M., Pollock R., Treisman R. Cell SS^Sg-KXBqgSSt. f 2] 
Passmore S., Maine G.T., Elble R., Christ C, Tye B.4C. J. Mol. Biol. 204:593-606(1 988).[ 3] 
Yanofsky M., Ma H., Bowman J., Drews G., Feldmann K.A., Meyerowitz E.M. Nature 
346:35-39(1 990).[ 4] Goto K., Meyerowitz E.M. Genes Dev. 8:1548-1560(1994).[ 5] 
Troebner W., Ramirez L., Motte P., Hue I., Huijser P., Loennig W.-E., Saedler H., Sommer 
H., Schwartz-Sommer Z. EMBP J. 1 1:4693-4704(1 992). [ 6] Ma H., Yanofsky M.F., 
Meyerowitz E.M. Genes Dev. 5:484-495 (1991). [El] 
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671. Transketolase signatures 

Transketolase (EC 2.2.1.1 ) (TK) catalyzes the reversible transfer of a two-carbon ketol unit 
from xylulose 5 -phosphate to an aldose receptor, such as ribose 5 -phosphate, to form 
sedoheptulose 7-phosphate and glyceraldehyde 3-phosphate. This enzyme, together with 
transaldolase, provides a link between the glycolytic and pentose-phosphate pathways. TK 
requires thiamin pyrophosphate as a cofactor. In most sources where TK has been purified, it 
is a homodimer of approximately 70 Kd subunits. TK sequences from a variety of eukaryotic 
and prokaryotic sources [1,2] show that the enzyme has been evolutionary conserved. In the 
peroxisomes of methy lotrophic yeast Hansenula polymorpha, there is a highly related 
enzyme, dihydroxy-acetone synthase (DHAS) (EC 2.2.1.3 ) (also known as formaldehyde 
transketolase), which exhibits a very unusual specificity by including formaldehyde amongst 
its substrates, l-deoxyxylulose-5-phosphate synthase (DXP synthase) [3] is an enzyme so far 
found in bacteria (gene dxs) and plants (gene CLA1) which catalyzes the thiamin 
pyrophosphoate-dependent acyloin condensation reaction between carbon atoms 2 and 3 of 
pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xylulose-5-phosphate (dxp), a 
precursor in the biosynthetic pathway to isoprenoids, thiamin (vitamin Bl), and pyridoxol 
(vitamin B6). DXP synthase is evolutionary related to TK. Two regions of TK have been 
selected as signature patterns. The first, located in the N-terminal section, contains a histidine 
residue which appears to function inproton transfer during catalysis [4]. The second, located 
in the central section, contains conserved acidic residues that are part of the active cleft and 
may participate in substrate-binding [4]. 

Consensus pattern: R-x(3)-[LIVMTA j )j. 

S£njDNO:6!.2)I- {<3&3V\^ 
[GS 

Consensus pattern: GH ; ©EQGSAj[DEGG 

^TAfijjSTAP SEP ID NO:J35)l-x(2)-[RGA] 

[ 1] Abedinia M, Layfield R., Jones S.M., Nixon P.F., Mattick J.S. Biochem. Biophys. Res. 
Commun. 183:1 159-1 166(1992).[ 2] Fletcher T.S., Kwee I.L., Nakada T., Largman C, 
Martin B.M. Biochemistry 31 :1892-1896(1992).[ 3] Sprenger G.A., Schorken U. ? Wiegert T., 
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Grolle S., De Graaf A. A., Taylor S.V., Begley T.P., Bringer-Meyer S., Sahm H. Proc. Natl. 
Acad. Sci. U.S.A. 94:12857-12862(1997). [ 4] Lindqvist Y., Schneider G., Ermler U., 
Sundstroem M. EMBO J. 1 1:2373-2379(1992). 

5 

672. Transmembrane 4 family signature 

Recently a number of eukaryotic cell surface antigens have been found to be evolutionary 
related [1,2,3]- The proteins known to belong to this family are listed below: - Mammalian 
antigen CD9 (MIC3); A protein involved in platelet activation and aggregation. - Mammalian 

10 leukocyte antigen CD37, expressed on B lymphocytes. - Mammalian leukocyte antigen CD53 
(OX-44), which may be involved in growth regulation in hematopoietic cells. - Mammalian 
lysosomal membrane protein CD63 (melanoma-associated antigen ME491; antigen AD1). - 
Mammalian antigen CD81 (cell surface protein TAPA-1), which may play an important role 
in the regulation of lymphoma cell growth. - Mammalian antigen CD82 (protein R2; antigen 

1 5 C33; Kangai 1 (KA11)), which associates with CD4 or CD8 and delivers costimulatory 

signals for the TCR/CD3 pathway. - Mammalian antigen CD151 (SFA-1; platelet-endothelial 
tetraspan antigen 3 (PETA-3)). - Mammalian cell surface glycoprotein A15 (TALLA-1; 
MXS1). - Mammalian novel antigen 2 (NAG-2). - Human tumor-associated antigen CO-029. 
- Schistosoma mansoni and japonicum 23 Kd surface antigen (SM23 / SJ23). These proteins 

2 0 share the following characteristics: they all seem to be type III membrane proteins (type III 
proteins are integral membrane proteins that contain a N-terminal membrane-anchoring 
domain which is not cleaved during biosynthesis and which functions both as a translocation 
signal and as a membrane anchor); they also contain three additional transmembrane regions, 
at least seven conserved cysteines residues, and are of approximately the same size (218 to 

2 5 284 residues). These proteins are collectively know as the 'transmembrane 4 super family' 
(TM4) because they span the plasma membrane four times. A schematic diagram of the 

domain structure of these proteins isshown below. +-+ + + — + + + 

+ +— - + | | TMa | Extra | TM2| Cyt | TM3 | Extracellular | TM4 | Cyt| +-+ 

+ C C + CC C C— + C----+ *********Cyt : cytoplasmic 

30 domain. TMa : transmembrane anchor.TM2 to TM4: transmembrane regions 2 to 4.'C : 
conserved cysteine. : position of the pattern. 
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A conserved region that includes two cysteines and seems to be located in a short 
cytoplasmic loop between two transmembrane domains has been selected as a signature for 
these proteins. 

Consensus pattern: G-x(3H-Ll-VM-F-¥LIVMF SEP ID HQ:2)]-xf2V[GSA]4WMFHLiy.iy{F 
5 SEQ^0mQ:2)](2)-G-C-x-rGAl-rSTAl- x(2)-rEG1-x(2HCWN1-fcP^M}fLIVM SEP ID 

NO:4)j(2) 

[ 1] Levy S., Nguyen V.Q., Andria M.L., Takahashi S. J. Biol. Chem. 266:14597- 
14602(1991).[ 2] Tomlinson M.G., Williams A.F., Wright M.D. Eur. J. Immunol. 23:136- 
40(1993).[ 3] Barclay A.N., Birkeland M.L., Brown M.H., Beyers A.D., Davis S.J., Somoza 
10 C, Williams A.F. The leucocyte antigen factbooks. Academic Press, London / San Diego, 
(1993). 

673. Tryptophan synthase alpha chain signature 

1 5 Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion 
of indoleglycerol phosphate and serine, totryptophan and glyceraldehyde 3 -phosphate [1,2]. It 
has two functional domains: one for the aldol cleavage of indoleglycerol phosphate to indole 
andglyceraldehyde 3 -phosphate and the other for the synthesis of tryptophan fromindole and 
serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and beta 

20 chains), while in fungi the two domains are fused together on a single multifunctional protein. 
A conserved region that contains three conserved acidic residues has been selected as a 
signature pattern for the alpha chain. The first and the third acidic residues are believed to 
serve as proton donors/acceptors in the enzyme's catalytic mechanism. 
Consensus pattern: fW¥M}fnVM SEpJD.Np:4)j-E-fLiVM 

2 5 x(2>fFYC1-rST14DE1-rPA1-jWM¥^i-LIVM.Y SEP ID NO: 141)]- FAQfcFHAGLl SEP ID 
NO:61 8Vi-[DE1-G 

[ 1] Crawford I.P. Annu. Rev. Microbiol. 43:567-600(1989).[ 2] Hyde C.C., Miles E.W. 
Bio/Technology 8:27-32(1990).[ 3] Berlyn M.B., Last R.L., Fink G.R. Proc. Natl. Acad. Sci. 
U.S.A. 86:4604-4608(1989). 

30 

674. Tryptophan synthase beta chain pyridoxal-phosphate attachment site 
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Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion 
of indoleglycerol phosphate and serine, totryptophan and glyceraldehyde 3-phosphate [1,2]. It 
has two functional domains: one for the aldol cleavage of indoleglycerol phosphate to indole 
andglyceraldehyde 3-phosphate and the other for the synthesis of tryptophan fromindole and 
serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and beta 
chains), while in fungi the two domains arefiised together on a single multifunctional protein. 
The beta chain of the enzyme requires pyridoxal-phosphate as a cofactor. The pyridoxal- 
phosphate group is attached to a lysine residue. The region around this lysine residue also 
contains two histidine residues which are part of the pyridoxal-phosphate binding site. The 
signature pattern for the tryptophansynthase beta chain is derived from that conserved region. 
-Consensus pattern: fLiVMj^^ [K is the 

pyridoxal-P attachment site] 

[ 1] Crawford LP. Annu. Rev. Microbiol. 43:567-600(1989).[ 2] Hyde C.C., Miles E.W. 
Bio/Technology 8:27-32(1990).[ 3] Berlyn M.B., Last R.L., Fink G.R. Proc. Natl. Acad. Sci. 
U.S.A. 86:4604-4608(1989). 

675. Serine proteases, trypsin family, active sites 

The catalytic activity of the serine proteases from the trypsin family is provided by a charge 
relay system involving an aspartic acid residue hydrogen-bonded to a histidine, which itself is 
hydrogen-bonded to a serine. The sequences in the vicinity of the active site serine and 
histidine residues are well conserved in this family of proteases [1]. A partial list of proteases 
known to belong to the trypsin family is shown below. - Acrosin. - Blood coagulation factors 
VII, IX, X, XI and XII, thrombin, plasminogen, and protein C. - Cathepsin G. - 
Chymotrypsins. - Complement components Clr, Cls, C2, and complement factors B, D and 
I. - Complement-activating component of RA-reactive factor. - Cytotoxic cell proteases 
(granzymes A to H). - Duodenase I. - Elastases 1, 2, 3 A, 3B (protease E), leukocyte 
(medullasin). - Enterokinase (EC 3.4.21.9 ) (enteropeptidase). - Hepatocyte growth factor 
activator. - Hepsin. - Glandular (tissue) kallikreins (including EGF-binding protein types A, 
B, and C, NGF-gamma chain, gamma-renin, prostate specific antigen (PSA) and tonin). - 
Plasma kallikrein. - Mast cell proteases (MCP) 1 (chymase) to 8. - Myeloblasts (proteinase 
3) (Wegener's autoantigen). - Plasminogen activators (urokinase-type, and tissue-type). - 
Trypsins I, II, III, and IV. - Tryptases. - Snake venom proteases such as ancrod, batroxobin, 
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cerastobin, flavoxobin, and protein C activator. - Collagenase from common cattle grub and 
collagenolytic protease from Atlantic sand fiddler crab. - Apolipoprotein(a). - Blood fluke 
cercarial protease. - Drosophila trypsin like proteases: alpha, easter, snake-locus. - Drosophila 
protease stubble (gene sb). - Major mite fecal allergen Der p III. All the above proteins 
5 belong to family SI in the classification of peptidases[2,El] and originate from eukaryotic 
species. It should be noted thatbacterial proteases that belong to family S2A are similar 
enough in the regions of the active site residues that they can be picked up by the same 
patterns. These proteases are listed below. - Achromobacter lyticus protease I. - Lysobacter 
alpha-lytic protease. - Streptogrisin A and B (Streptomyces proteases A and B). - 
1 0 Streptomyces griseus glutamyl endopeptidase II. - Streptomyces fradiae proteases 1 and 2. 
Consensus pattern: f L-I VM^^^ 
NO:20)]-H-C [H is the active site residue] 

Consensus pattern: [D NST/ \G G}[DNSTAGC JEQJD NO;619)l- 
l^£ A p}M^(m\\ GXYAPlM¥QH SEP I D NO:620yj-xr2VG-[DE]-S-G-[GS]- 
15 Ir&AmV)^ [Li-V-MPY-WM^ 

f«YMF-¥STAN^ [S is the active site residue] 

[ 1] Brenner S. Nature 334:528-530(1988).[ 2] Rawlings N.D., Barrett A.J. Meth. Enzymol. 
244:19-61(1994).[E1] 

20 

676. (tsp) Thrombospondin type 1 domain 
[1] Bork P; FEBS lett 1993;327:125-130. 

25 

677. Tubulin subunits alpha, beta, and gamma signature 

Tubulins [1,2], the major constituent of microtubules are dimeric proteins which consist of 
two closely related subunits (alpha and beta). Tubulin binds two molecules of GTP at two 
different sites (N and E). At the E (Exchangeable) site, GTP is hydrolyzed during 
30 incorporation into the microtubule. Near the E site is an invariant region rich in glycines 

which is found in both chains andwhich is now [3] said to control the access of the nucleotide 
to its binding site. A signature pattern was developed from this region. With the exception of 
the simple eukaryotes, most species express a variety of closely related alpha and beta 
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isotypes. In most species there is a third member of the tubulin family: gamma tubulin. 
Gamma tubulin is found at microtubule organizing centers (MTOC) such as the spindle poles 
or the centrosome, suggesting that it is involved in the minus-end nucleation of microtubule 
assembly [4]. 
5 Consensus pattern: [SAG]-G-G-T-G-[SA]-G 

[ 1] Cleveland D.W., Sullivan K.F. Annu. Rev. Biochem. 54:331-365(1985).[ 2] Joshi H.C., 
Cleveland D.W. Cell Motil. Cytoskeleton 16:159-163(1990).[ 3] Hesse J., Thierauf M., 
Ponstingl H. J. Biol. Chem. 262:15472-15475(1987).[ 4] Joshi H.C. BioEssays 15:637- 
643(1993). 

10 

Tubulin-beta mRNA autoregulation signal 

The stability of beta-tubulin mRNAs are autoregulated by their own translation product [1], 
Unpolymerized tubulin subunits bind directly (or activate a factor(s) which binds co- 
translationally) to the nascent N-terminus of beta-tubulin. This binding is transduced through 

1 5 the adjacent ribosomes to activatean RNAse that degrades the polysome-bound mRNA. The 
recognition element has been shown to be the first four amino acids of beta-tubulin: Met-Arg- 
Glu-Ile. Mutations to this sequence abolish the autoregulation effect (except for the 
replacement of Glu by Asp); transposition of this sequence to an internal region of a 
polypeptide also suppresses the autoregulatory effect. 

2 0 Consensus pattern: <M-R-[DE]-[IL] 

[ 1] Cleveland D.W. Trends Biochem. Sci. 13:339-343(1988). 

678. (tRNA-synt 2c) Aminoacyl-transfer RNA synthetases class-II signatures. Aminoacyl- 
2 5 tRN A synthetases (EC 6. 1 . 1 .-) [ 1 ] are a group of enzymes which activate amino acids and 
transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
30 mitochondrial form. While all these enzymes have a common function, they are widely 
diverse in terms of subunit size and of quaternary structure. The synthetases specific for 
alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-II synthetases [2 to 6] and probably have a common 
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folding pattern in their catalytic domain for the binding of ATP and amino acid which is 
different to the Rossmann fold observed for the class I synthetases [7].Class-II tRNA 
synthetases do not share a high degree of similarity, however at least three conserved regions 
are present [2,5,8]. Signature patterns have been derived from two of these regions. 

5 

Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE]- 

Consensus pattern: fGS^A4-v¥-F tiGSrALVF SE P ID NO:42)1-fCTNO-H-RKP4 !DENQHRKP 
S EQJD. NO: 4 3 ) j. 4G&FA|[G SI A S E 

10 fi^Mj ^ijyMFY SEP ID NO: 3 8) 1- 

[ 1] Schimmel P. Annu. Rev. Biochem. 56: 125-1 58(1987).[ 2] Delarue M, Moras D. 
BioEssays 15:675-687(1993).[ 3] Schimmel P. Trends Biochem. Sci. 16: 1-3(1991 ).[ 4] Nagel 
G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). [ 5] Cusack S., 
15 Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1 99 1).[ 6] Cusack S. 

Biochimie 75:1077-1081(1993).[ 7] Cusack S., Berthet-Colominas C, Haertlein M., Nassar 
N., Leberman R. Nature 347:249-255(1 990).[ 8] Leveque F., Plateau P., Dessen P., Blanquet 
S. Nucleic Acids Res. 18:305-312(1990). 

20 

679. UBA-domain 

The UBA-domain (ubiquitin associated domain) is a novel sequence motif found in 
several proteins having connections to ubiquitin and the ubiquitination pathway. The 
structure of the UBA domain consists of a compact three helix bundle [1]. Number of 
25 members: 84 

[1] Structure of a human DNA repair protein UBA domain that interacts with HIV-1 
Vpr. Dieckmann T, Withers- Ward ES, Jarosinski MA, Liu CF, Chen IS, Feigon J; Nat Struct 
Biol 1998;5:1042-1047. 

30 

680. UBX domain 

Domain present in ubiquitin-regulatory proteins. Present in FAF1 and Shplp.Number of 
members: 19 
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[1] The UBA domain: a sequence motif present in multiple enzyme classes of the 
ubiquitination pathway. Hofmann K, Bucher P; Trends Biochem Sci 1996;21:172-173. 



5 681. (UCH) Ubiquitin carboxyl-terminal hydrolases family 1 cysteine active site 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. There are two distinct families of UCH. The first class consist 

1 0 of enzymes ofabout 25 Kd and is currently represented by: - Mammalian isozymes LI and 
L3. - Yeast YUH1. - Drosophila Uch.One of the active site residues of class-I UCH [3] is a 
cysteine. A signature pattern has been derived from the region around that residue. 
Consensus pattern: Q-x(3)-N-rSAl-C-G-x(3)-f«VMtgT.JVK4 SEQ I D N O:4?](2)-H-[SA]- 
HAV-h£\\UVM S EP ID NO:4 yj-[SA] [C is the active site residue 

15 [1] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:127-139(1991).[ 2] 
D'andrea A., Pellman D. Crit. Rev. Biochem. Mol. Biol. 33:337-352(1998).[ 3] Johnston 
S.C., Larsen C.N., Cook W.J., Wilkinson K.D., Hill CP. EMBO J. 16:3787-3796(1997).[ 4] 
Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

20 

682. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-1) 
Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 

2 5 as that of ubiquinated proteins. There are two distinct families of UCH. The second class 

consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1, 
UBP2, UBP3, UBP4 (or DOA4/SSV7), UBP5, UBP7, UBP9, UBP10, UBP1 1 ? UBP12, 
UBP13, UBP14, UBP15 and UBP16. - Human tre-2. - Human isopeptidase T. - Human 
isopeptidase T-3. - Mammalian Ode-L - Mammalian Unp. - Mouse Dub-1. - Drosophila fat 

3 0 facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - 

Caenorhabditis elegans hypothetical protein R10E1 1.3. - Caenorhabditis elegans hypothetical 
protein K02C4.3.These proteins only share two regions of similarity. The first region 
containsa conserved cysteine which is probably implicated in the catalytic mechanism. The 
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second region contains two conserved histidines residues, one of which is also probably 
implicated in the catalytic mechanism. Signature patterns for both conserved regions have 
been developed. 

Consensus pattern: G-f L4VWt^}£LI VK|F Y. SEQ JD.NOU 8)j-x(l ,3)-[AGC 
5 SEQJllNO:M^ 

SEO jl) NO;39i)l-^ [C is the putative active site 

residue] 

Consensus pattern: Y-x-L-x-[SAG]-(t;J¥MFI^ 
x(4,5)-G-H-Y [The two H's are putative active site residues] 
10 [1] Jentsch S. 5 Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:127-139(1991).[ 2] 
D'andrea A., Pellman D. Crit. Rev. Biochem. Mol. Biol. 33:337-352(1998).[ 3] Rawlings 
N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

1 5 683. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-2) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. There are two distinct families of UCH. The second class 

2 0 consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1, 
UBP2, UBP3, UBP4 (or DOA4/SSV7), UBP5, UBP7, UBP9, UBP10, UBP1 1, UBP12, 
UBP13, UBP14, UBP15 and UBP16. - Human tre-2. - Human isopeptidase T. - Human 
isopeptidase T-3. - Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - Drosophila fat 
facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - 

2 5 Caenorhabditis elegans hypothetical protein Rl 0E1 1 .3. - Caenorhabditis elegans hypothetical 

protein K02C4.3.These proteins only share two regions of similarity. The first region 
containsa conserved cysteine which is probably implicated in the catalytic mechanism. The 
second region contains two conserved histidines residues, one of which is also probably 
implicated in the catalytic mechanism. Signature patterns for both conserved regions have 

3 0 been developed. 

Consensus pattern: G-fLP^F¥|(LIVMFY SEP ID NO: 1 8)j-x(13)-[AGC]-fflASM}[NASM 
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SI.Q JD..NO:391j1-x-[WVMS4{ [C is the putative active site 

residue] 

Consensus pattern: Y>x-L-x4SAG14fe^Mm iLlVNfFT SE P I D NO:2 82)]-x(2>H-x-G- 
x(4,5)-G-H-Y [The two H's are putative active site residues] 
5 [ 1] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:127-139(1991).[ 2] 
D'andrea A., Pellman D. Crit. Rev. Biochem. Mol. Biol. 33:337-352(1998). [ 3] Rawlings 
N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

1 0 684. UDP-glycosyltransferases signature 

UDP glycosyltransferases (UGT) are a superfamily of enzymes that catalyzes the addition of 
the glycosyl group from a UTP-sugar to a small hydrophobic molecule. This family currently 
consist of: - Mammalian UDP-glucoronosyl transferases (UDPGT) [1,2]. A large family of 
membrane-bound microsomal enzymes which catalyze the transfer of glucuronic acid to a 

15 wide variety of exogenous and endogenous lipophilic substrates. These enzymes are of major 
importance in the detoxification and subsequent elimination of xenobiotics such as drugs and 
carcinogens. - A large number of putative UDPGT from Caenorhabditis elegans. - 
Mammalian 2-hydroxyacylsphingosine 1 -beta-galactosyltransferase [3] (also known as UDP- 
galactose-ceramide galactosyltransferase). This enzyme catalyzes the transfer of galactose to 

2 0 ceramide, a key enzymatic step in the biosynthesis of galactocerebrosides, which are 

abundant sphingolipids of the myelin membrane of the central nervous system and peripheral 
nervous system. - Plants flavonol 0(3)-glucosyltransferase. An enzyme [4] that catalyzes the 
transfer of glucose from UDP-glucose to a flavanol. This reaction is essential and one of the 
last steps in anthocyanin pigment biosynthesis. - Baculoviruses ecdysteroid UDP- 

2 5 glucosyltransferase (EC 2.4. 1 .-) [5] (egt). This enzyme catalyzes the transfer of glucose from 
UDP-glucose to ectysteroids which are insect molting hormones. The expression of egt in the 
insect host interferes with the normal insect development by blocking the molting process. - 
Prokaryotic zeaxanthin glucosyl transferase (gene crtX), an enzyme involved in carotenoid 
biosynthesis and that catalyses the glycosylation reaction which converts zeaxanthin to 

30 zeaxanthin-beta- diglucoside. - Streptomyces macrolide glycosyltransferases [6]. These 

enzymes specifically inactivates macrolide anitibiotics via 2'-0-glycosylation using UDP- 
glucose. These enzymes share a conserved domain of about 50 amino acid residues locatedin 
their C-terminal section and from which a pattern has been extracted todetect them. 
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Consensus pattern: [FWJ-x(2)-Q-x(2)-flvlVM¥Aj[| jyM'\'A. SEQJp NP:609)]- 
fkfMV tiLlMV SEP ID NO:34M-x(4 < 6VffcVGA€^fT..VGAC SEP ID NO:624Yj - 
|4,W¥AH 1..VFYA SEP ID NO:62SYj- hfcP^M-RjLjVMF SEP ID NP:2)j- 
{•S4A^€Mj[STAGCM.SEQ 
5 x(2)484^1[SIAG.SEi,)JDN;oa^ 

^WMFAti'IIVMFA SEP IP NO:81 V)-x(4V[POR]H^IVMy4jLlV W SEP IP NO:lV) -xf3V 
rPAl-x(3)-rDES1-ppKl#q iPEHN SEP IDNO:6 28Yi 

[ 1] Dutton G.J. (In) Glucoronidation of drugs and other compounds, Dutton G.J., Ed., pp 1- 
78, CRC Press, Boca Raton, (1980).[ 2] Burchell B., Nebert D.W., Nelson D.R., Bock K.W., 

10 Iyanagi T., Jansen P.L., Lancet D., Mulder G.J., Chowdhury J.R., Siest G., Tephly T.R., 
Mackenzie P.I. DNA Cell Biol. 10:487-494(1991).[ 3] Schulte S., Stoffel W. Proc. Natl. 
Acad. Sci. U.S.A. 90:10265-10269(1993).[ 4] Furtek D., Schiefelbein J.W., Johnston F., 
Nelson P.E. Jr. Plant Mol. Biol. 1 1:473-481(1988).[ 5] P'Reilly D.R., Miller L.K. Science 
245:11 10-1 112(1989).[ 6] Hernandez C, Piano C, Mendez C, Salas J.A. Gene 134:139- 

15 140(1993). 

685. UDP-glucose/GDP-mannose dehydrogenase family 

The UDP-glucose/GDP-mannose dehydrogenaseses are a small group of enzymes 
2 0 which possesses the ability to catlyze the NAD-dependent 2-fold oxidation of an alcholol to 

an acid without the release of an aldehyde intermediate [2]. Number of members: 55 
[1] Purification and characterization of guanosine diphospho-D-mannose 

dehydrogenase. A key enzyme in the biosynthesis of alginate by Pseudomonas aeruginosa. 

Roychoudhury S, May TB, Gill JF, Singh SK, Feingold DS, Chakrabarty AM; J Biol Chem 
25 1989;264:9380-9385. [2] Properties and kinetic analysis of UDP-glucose dehydrogenase 

from group A streptococci. Irreversible inhibition by UDP-chloroacetol. Campbell RE, Sala 

RF, van de Rijn I, Tanner ME; J Biol Chem 1997;272:3416-3422. 

30 686. Uracil-DNA glycosylase signature 

Uracil-DNA glycosylase (EC 3.2.2.-) (UNG) [1] is a DNA repair enzyme that excises uracil 
residues from DNA by cleaving the N-glycosylic bond. Uracil in DNA can arise as a result of 
misincorportation of dUMP residues by DNA polymerase or deamination of cytosine. The 
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sequence of uracil-DNA glycosylase is extremely well conserved [2] in bacteria and 
eukaryotes as well as in herpes viruses. More distantly related uracil-DNA glycosylases are 
also found in poxviruses [3]. In eukaryotic cells, UNG activity is found in both the nucleus 
and the mitochondria. Human UNG1 protein is transported to both the mitochondria and the 
5 nucleus [4]. The N-terminal 77 amino acids of UNG 1 seem to be required for mitochondrial 
localization [4], but the presence of a mitochondrial transitpeptide has not been directly 
demonstrated. As a signature for this type of enzyme, the most N-termina conserved region 
has been selected. This region contains an aspartic acid residue which has been proposed, 
based on X-ray structures [5,6] to act as a general base in the catalytic mechanism. 
1 0 Consensus pattern: [KR]-[LI V ]-R U V C SEP ID N 0 : 629 ) 1 4klVM4i I, J V M SHQ ID 
NO;4ij-x-G-[QI]-D-P-Y [D is the active site residue]- 

[ 1] Sancar A., Sancar G.B. Annu. Rev. Biochem. 57:29-67(1988).[ 2] Olsen L.C., Aasland 
R., Wittwer C.U., Krokan H.E., Helland D.E. EMBO J. 8:3121-3125 (1989).[ 3] Upton C., 
Stuart D.T., McFadden G. Proc. Natl. Acad. Sci. U.S.A. 90:4518-4522(1993).[ 4] Slupphaug 

1 5 G., Markussen F.-H., Olsen L.C., Aasland R., Aarsaether N., Bakke O., Krokan H.E., Helland 
D.E. Nucleic Acids Res. 21:2579-2584(1993).[ 5] Sawa R. 5 McAuley-Hecht K., Brown T., 
Pearl L. Nature 373:487-493(1995).[ 6] Mol CD., Arvai A.S., Slupphaug G., Kavli B., 
Alseth L, Krohan H.E., Tainer J.A. Cell 80:869-878(1995)1 7] Muller S.J., Caradonna S. 
Biochim. Biophys. Acta 1088:197-207(1991).[ 8] Meyer-Siegler K., Mauro D.J., Seal G. 9 

2 0 Wurzer J., Deriel J.K., Sirover M.A. Proc. Natl. Acad. Sci. U.S.A. 88:8460-8464(1991).[ 9] 
Muller S.J., Caradonna S. J. Biol. Chem. 268:1310-1319(1993).[10] Barnes D.E., Lindahl T. ? 
Sedgwick B. Curr. Opin. Cell Biol. 5:424-433(1993). 

25 687. Uncharacterized protein family UPF0001 signature 

The following uncharacterized proteins have been shown [1] to share regions ofsimilarities: - 
Yeast chromosome II hypothetical protein YBL036c. - Caenorhabditis elegans hypothetical 
protein F09E5.8. - Bacillus subtilis hypothetical protein ylmE. - Escherichia coli hypothetical 
protein yggS and HI0090, the corresponding Haemophilus influenzae protein. - Helicobacter 

30 pylori hypothetical protein HP0395. - Mycobacterium tuberculosis hypothetical protein 
MtCY270.20. - Synechocystis strain PCC 6803 hypothetical protein slr0556. - A 
Pseudomonas aeruginosa hypothetical protein in pilT 5'region. - A Vibrio alginolyticus 
hypothetical protein in pilT 5'region. These are proteins of from 25 to 30 Kd which contain a 
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number of conserved regions. The best conserved region which is located in the first third of 
these proteins has been selected as a signature pattern. 

Consensus pattern: [FW]-H-[FM]-[IV]-G-x-[LIV]-Q-x-[NKR]-K-x(3)-[LIV] 
[ 1] Bairoch A., Rudd K.E. Unpublished observations (1996). 

5 

688. Uncharacterized protein family UPF0003 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli protein aefA. - Escherichia coli hypothetical protein yggB. - Escherichia coli 

10 hypothetical protein yjeP and HI0195.1, the corresponding Haemophilus influenzae protein. - 
Escherichia coli hypothetical protein ynal. - Bacillus subtilis hypothetical protein yhdY. - 
Helicobacter pylori hypothetical protein HP0415. - Synechocystis strain PCC 6803 
hypothetical protein slr0639. - Archaeoglobus fiilgidus hypothetical protein AF1546. - 
Methanococcus jannaschii hypothetical protein MJ0170. - Methanococcus jannaschii 

15 hypothetical protein MJ1 143 .The size of these proteins range from 30 to 120 Kd. They all 
contain a number of transmembrane regions. The best conserved region which is located in 
and just after the last potential transmembrane region has been selected as a signature 
pattern,. 

Consensus pattern: G-fSTIl^l^ 
20 NO:£a-x(6)4LI-VMF}[LIVMF.S 
[ UVMF] LUVM^ 

[ 1] Bairoch A. Unpublished observations (1997). 

2 5 689. Uncharacterized protein family UPF0004 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli hypothetical protein yliG. - Escherichia coli hypothetical protein yleA and 
HI0019, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical 
protein yqeV. - Helicobacter pylori hypothetical protein HP0269. - Helicobacter pylori 

30 hypothetical protein HP0285. - Mycoplasma iowae hypothetical protein in 16S RNA 
5'region. - Mycobacterium leprae hypothetical protein B2235 C2195. - Pseudomonas 
aeruginosa hypothetical protein in hemL 3 'region. - Synechocystis strain PCC 6803 
hypothetical protein slr0082. - Synechocystis strain PCC 6803 hypothetical protein sll0996. - 
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Methanococcus jannaschii hypothetical protein MJ0865. - Methanococcus jannaschii 
hypothetical protein MJ0867. - Caenorhabditis elegans hypothetical protein F25B5.5.The size 
of these proteins range from 47 to 61 Kd. They contain six conserved cysteines, three of 
which are clustered in a region that can be used as asignature pattern. 
5 Consensus pattern: W-^M]-^ 

x(2>G-C-x(3VCn^¥Am( STAN SEP ID NO:250 VhrFY1-C-x-|j^Mj|LI VM SEP ID 
NO:4) j- x(4)-G 

[1] Bairoch A. Unpublished observations (1997). 

10 

690. Uncharacterized protein family UPF0005 signature 

The following proteins seems to be evolutionary related [1]: - Mammalian protein TEGT 
(Testis Enhanced Gene Transcript). - Escherichia coli hypothetical protein yccA and HI0044, 
the corresponding Haemophilus influenzae protein. - A probable Pseudomonas aeruginosa 
1 5 ortholog of yccA. These are proteins of about 25 Kd which seem to contain seven 

transmembranedomains. A signature pattern that corresponds to a region that starts with the 
beginning of the third transmembrane domain and ends in the middle of the fourth one has 
been developed. 

Consensus pattern: G-|i4¥M]^ 
20 SEQjDNO^ 

x(3)-T-A-fWM^ 

[ 1] Walter L., Marynen P., Szpirer J., Levan G., Guenther E. Genomics 28:301-304(1995). 

2 5 691 . Uncharacterized protein family UPF0006 signatures 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Yeast chromosome II hypothetical protein YBL055c. - Escherichia coli hypothetical protein 
ycfH and HI0454, the corresponding Haemophilus influenzae protein. - Escherichia coli 
hypothetical protein yigW. - Escherichia coli hypothetical protein yjjV and HI0081, the 

30 corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yabD. 
- Haemophilus influenzae hypothetical protein HI 1664. - Mycoplasma genitalium 
hypothetical protein MG009. These are proteins of from 24 to 47 Kd which contain a number 
of conserved regions. They can be picked up in the database by the following patterns. 
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Consensus pattern: [LWMF¥]{yV 
fWN4R fL!VNfF SEP FD NQ:2>1-[DN 

Consensus pattern: P4WM tiUVM. SEP ID NO:4) )-x4I^V^ jLlVM SEP ID NO:4Y| -H-x- 
R-x-[TA]-x-[DE 

Consensus pattern: itV^A][L VSA SEC^.ID. N 

x(2)-R^M|jTJV M S E P ID NP:4)i4PSl>x(3)-L-R^M}[ LIVM S E P I D N O:4Y}- 

R:4VM^(LIVMS SEP ID NO:429Yi -E-T- D-x-P 

[ 1] Bairoch A., Rudd K.E. Unpublished observations (1995). 

692. Uncharacterized protein family UPF0007 signature 

The following proteins seems to be evolutionary related [1]: - Escherichia coli hypothetical 
protein ygbP and HI0672, the corresponding Haemophilus influenzae protein. - Bacillus 
subtilis hypothetical protein yacM. - Mycobacterium tuberculosis hypothetical protein 
MtCY06Gl 1.29c. - Synechocystis strain PCC 6803 hypothetical protein slr095L - A 
Rhodobacter capsulatus hypothetical protein in nifR3 5 'region. Except for the Rhodobacter 
protein which contains a C-terminal extension, all these proteins have from 225 to 236 amino 
acids. They are hydrophilic proteins that can be picked up in the database by the following 
pattern. 

Consensus pattern: V-L-[IV]-H-D-[GA]-A-R 
[ 1] Bairoch A. Unpublished observations (1997). 

693. Uncharacterized protein family UPF0015 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Yeast chromosome II hypothetical protein YBR002c. - Yeast chromosome XIII hypothetical 
protein YMRlOlc. - Escherichia coli hypothetical protein yaeU and HI0920, the 
corresponding Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein 
HP 1221. - Mycobacterium leprae hypothetical protein B1937 F265. - A Corynebacterium 
glutamicum hypothetical protein in aroF 3 'region. - A Streptomyces fradiae hypothetical 
protein in transposon Tn4556. - Synechocystis strain PCC 6803 hypothetical protein sll0505. 
- Methanococcus jannaschii hypothetical protein MJ1372.These are proteins of about 26 to 
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40 Kd whose central region is well conserved. They can be picked up in the database by the 
following pattern. 

Consensus pattern: rDE14faf¥MF tiLIVMF SEQ JD NO:2) |(3)-R-T-[SG]-G-xf2)-R-x-S-x- 
[FY]-fM¥M}[LIVM 
5 [1] Wolfe K.H., Lohan A.J.E. Yeast 10:S41-S46(1994). 

694. Uncharacterized protein family UPF0016 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
1 0 Yeast hypothetical protein YBR1 87w. - Fission yeast hypothetical protein SpACl 7G8.08c. - 
Mouse protein pFT27. - Synechocystis strain PCC 6803 hypothetical protein sll0615. These 
are hydrophobic proteins of 200 to 320 amino acids that seem to contain six or seven 
transmembrane domains. A conserved region which seems, in the eukaryotic proteins of this 
family, to directly follow the second transmembrane domain has been selected as a signature 
1 5 pattern. 

Consensus pattern: E-(fciVMi[UV 
NQ:2)I(2)-A- 

[ 1] Bairoch A. Unpublished observations (1996). 

20 

695. Uncharacterized protein family UPF0021 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Yeast chromosome VII hypothetical protein YGL21 1 w. - Dictyostelium discoideum protein 
vegl36. - Methanococcus jannaschii hypothetical proteins MJ1157 and MJ1478.These are 
25 proteins of from 300 to 36o residues. They can be picked up in thedatabase by the following 
pattern which is located in their N-terminalsection. 
Consensus pattern: C-K-x(2)-F-x(4)-E-x(22,23)-S-G-G-K-D 
[ 1] Bairoch A. Unpublished observations (1997). 

30 

696. Uncharacterized protein family UPF0023 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Mouse protein 22 A3. - Yeast chromosome XII hypothetical protein YLR022c. - 
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Caenorhabditis elegans hypothetical protein W06E1 1.4. - Methanococcus jannaschii 
hypothetical protein MJ0592.These are hydrophilic proteins of about 30 Kd. They can be 
picked up in the database by the following pattern. 
Consensus pattern: D-x-D-E-[LIV]-L-x(4)-V-F-x(3)-S-K-G- 
5 [1] Bairoch A. Unpublished observations (1997). 

697. Uncharacterized protein family UPF0024 signature. The following uncharacterized 
proteins have been shown [1] to share regions of similarities: - Escherichia coli hypothetical 

10 protein ygbO and HI0701, the corresponding Haemophilus influenzae protein. - Helicobacter 
pylori hypothetical protein HP0926. - Yeast chromosome XV hypothetical protein YOR243c. 
- Caenorhabditis elegans hypothetical protein B0024.1 1. - Methanococcus jannaschii 
hypothetical proteins MJ0588 and MJ1364.These are hydrophilic proteins of from 39 to 77 
Kd. They can be picked up in the database by the following pattern. 

15 

Consensus pattern: G-x-K-D4KR]-x-A-[LV]-T-x-Q-x- 
[SGC]- 

[ 1] Bairoch A. Unpublished observations (1997). 

20 

698. Uncharacterized protein family UPF0025 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli hypothetical protein yfcE. - Bacillus subtilis hypothetical protein ysnB. - 
2 5 Mycoplasma genitalium and pneumoniae hypothetical protein MG207. - Methanococcus 
jannaschii hypothetical proteins MJ0623 and MJ0936. These are hydrophilic proteins of 
about 20 Kd. They can be picked up in thedatabase by the following pattern. 
Consensus pattern: D-V-[LIV]<2)-G-H-[ST]-^ 
N-P-G 

30 [1] Bairoch A. Unpublished observations (1997). 



699. Uncharacterized protein family UPF0029 signature 
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The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Yeast chromosome III hypothetical protein YCR59c. - Yeast chromosome IV hypothetical 
protein YDL177C. - Escherichia coli hypothetical protein yigZ and HI0722, the 
corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yvyE. 
5 - A Thermus aquaticus hypothetical protein in pol 5 'region. These proteins can be picked up 
in the database by the following pattern. 

Consensus pattern: G-x(2)-fkWM} f L! VM SEP ID NQ:4Yj (2)-x(2)-ffeWM4 1I.JVM SEP ID 

[FYW](2)-G-G-x(2HUVM^ 
10 [1] Koonin E.V., Bork P., Sander C. EMBO J. 13:493-503(1994). 

700. Uncharacterized protein family UPF0030 signature 

The following uncharacterized proteins have been shown [1] to be highly similar: - Yeast 
15 chromosome VI hypothetical protein YFL060c. - Yeast chromosome XIII hypothetical 
protein YMR095c. - Yeast chromosome XIV hypothetical protein YNL334c. - Bacillus 
subtilis hypothetical protein yaaE. - Haemophilus influenzae hypothetical protein HI 1648. - 
Methanococcus jannaschii hypothetical protein M J 1661. These are hydrophilic proteins of 
about 19 to 25 Kd. They can be picked up inthe database by the following pattern. 
2 0 Consensus pattern: [GA]-L-I-[LIV]-P-G-G-E-S-T-[STA] 
[ 1] Bairoch A. Unpublished observations (1997). 

701. Uncharacterized protein family UPF0032 signature 

2 5 The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli hypothetical protein yigU and HI0188, the corresponding Haemophilus 
influenzae protein. - Bacillus subtilis hypothetical protein ycbT. - Mycobacterium 
tuberculosis hypothetical protein MtCY49.33c and U2126A, the corresponding 
Mycobacterium leprae protein. - Synechocystis strain PCC 6803 hypothetical protein sll0194. 

30 - Odontella sinensis and Porphyra purpurea chlroplast hypothetical protein ycf43. These 
proteins have from 245 to 3 1 7 amino acids and seem to contain at least six or seven 
transmembrane regions. A conserved region located in the central section of these proteins 
has been developed as a signature pattern,. 



Reference No. 2750-942P 

575 

Consensus pattern: Y-x(2)-F4IvWM^ 

FEQ1 4MVMR (I J VMF SEP ID NQ:2Y1-P- fUVMI fLIVM SEP ID NO:4Y) - 
[ 1] Bairoch A., Rudd K.E. Unpublished observations (1996). 

5 

702. Uncharacterized protein family UPF0034 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli hypothetical protein yhdG and HI0979, the corresponding Haemophilus 
influenzae protein. - Escherichia coli hypothetical protein yjbN and HI0634, the 

1 0 corresponding Haemophilus influenzae protein. - Escherichia coli hypothetical protein yohl 
and HI0270, the corresponding Haemophilus influenzae protein. - Bacillus subtilis 
hypothetical protein yacF. - Rhodobacter capsulatus protein nifR3 and related proteins in 
Azospirillum brasilense and Rhizobium leguminosarum. - Synechocystis strain PCC 6803 
hypothetical protein slr0644. - Synechocystis strain PCC 6803 hypothetical protein sl!0926. - 

1 5 Caenorhabditis elegans hypothetical protein C45G9.2. - Yeast protein SMM1 . - Yeast 

hypothetical protein YLR401c. - Yeast hypothetical protein YLR405w. - Yeast hypothetical 
protein YML080w. Although it has been proposed [2] that Rhodobacter capsulatus nifR3 is a 
transcriptional regulatory protein, it is believed that these proteins constitute a family of 
enzymes whose active site could include a conserved cysteine which has been used as the 

2 0 central part of a signature pattern. 

Consensus pattern: M^ILIYM 

N-x-G-C-P-x(3)4^^^AgQj[LlV MASO SE P ID NO:632 )j-x(S)-G-[SAC] 

[ 1] Bairoch A., Rudd K.E. Unpublished observations (1995).[ 2] Foster-Hartnett D., Cullen 

P.J., Gabbert K.K., Kranz R.G. Mol. Microbiol. 8:903-914(1993). 

25 

703. Uncharacterized protein family UPF0038 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli hypothetical protein yacE and HI0890, the corresponding Haemophilus 
30 influenzae protein. - Mycobacterium tuberculosis hypothetical protein MtCY01B2.23 and 
O410, the corresponding Mycobacterium leprae protein. - Synechocystis strain PCC 6803 
hypothetical protein slr0553. - Other hypothetical proteins from Aeromonas hydrophila, 
Bacteroides nodosus, Neisseria gonorrhoeae, Pseudomonas putida, Thermus thermophilus 
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and Xanthomonas campestris. - Human hypothetical protein pOV-2. - Yeast hypothetical 
protein YDR196C. - Caenorhabditis elegans hypothetical protein T05G5.5.These proteins all 
contain, in their N-terminal extremity, an ATP/GTP-binding motif 'A' (P-loop) (see 
< PDOC00017 >). The size of these proteins range from 200 to 290 residues (with the 
5 exception of the Mycobacterial sequences which are are 410 residues long). A conseved 
region some 50 residues away from the ATP-binding P-loop has been developed as a 
signature pattern. 

Consensus pattern: G-x-[LI]-x-R-x(2)-L-x(4)-F-x(8)-[LIV]-x(5)-P-x-[LIV] - 
[ 1] Rudd K.E., Bairoch A. Unpublished observations (1997). 

10 

704. Ubiquitin-conjugating enzymes active site 

Ubiqui tin-conjugating enzymes (UBC or E2 enzymes) [1,2,3] catalyze the covalent 
attachment of ubiquitin to target proteins. An activatedubiquitin moiety is transferred from an 
15 ubiquitin-activating enzyme (El) to E2 which later ligates ubiquitin directly to substrate 

proteins with or without the assistance of 'N-end' recognizing proteins (E3). In most species 
there are many forms of UBC (at least 9 in yeast) which are implicated in diverse cellular 
functions. A cysteine residue is required for ubiquitin-thiolester formation. There is a single 
conserved cysteine in UBC's and the region around that residue isconserved in the sequence 

2 0 of known UBC isozymes. That region has been used as a signature pattern. 

Consensus pattern: j FYWLSPj fFYWL SP SE P ID N O^r^ Vj-H-tPCl-pSfHj-tLIVl-xrS^G- 
x-[LIV]-C-[LIV]-x- [LIV] [C is the active site residue] 

[ 1] Jentsch S., Seufert W., Sommer T., Reins H.-A. Trends Biochem. Sci. 15:195- 
198(1990).[ 2] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:127- 
25 139(1991).[ 3] Hershko A. Trends Biochem. Sci. 16:265-268(1991). 

705. Uroporphyrinogen decarboxylase signatures 

Uroporphyrinogen decarboxylase (URO-D), the fifth enzyme of the heme biosynthetic 

3 0 pathway, catalyzes the sequential decarboxylation of the four acetyl side chains of 

uroporphyrinogen to yield coproporphyrinogen [1]. URO-D deficiency is responsible for the 
Human genetic diseases familialporphyria cutanea tarda (fPCT) and hepatoerythropoietic 
porphyria (HEP). The sequence of URO-D has been well conserved throughout evolution. 
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The best conserved region is located in the N-terminal section; it contains a 
perfectlyconserved hexapeptide. There are two arginine residues in this hexapeptide which 
could be involved in the binding, via salt bridges, to the carboxylgroups of the propionate 
side chains of the substrate. This region has been used as a signature pattern. A second 
5 signature pattern is based on a another well conserved region which is located in the central 
section of the protein. 

Consensus pattern: P-x-W-x-M-R-Q-A-G-R 
Consensus pattern: G-F4STAGGVf[SJAG^ 
IQ.NQ:^ [GK] 
10 [1] Garey J.R., Labbe-Bois R. ? Chelstowska A., Rytka J., Harrison L., Kushner J., Labbe P. 
Eur. J. Biochem. 205:101 1-1016(1992). 

706. ubiE/COQ5 methyltransferase family signatures 

15 The following methyltransferase s have been shown [1] to share regions of similarities: - 

Escherichia coli ubiE, which is involved in both ubiquinone and menaquinone biosynthesis 
and which catalyzes the S-adenosylmethionine dependent methy lation of 2-polyprenyl-6- 
methoxy-l ? 4-benzoquinol into 2-polyprenyl-3- methyl-6-methoxy-l,4-benzoquinol and of 
demethylmenaquinol into menaquinol. - Yeast COQ5, a ubiquinone biosynthesis 

2 0 methlytransferase. - Bacillus subtilis spore germination protein C2 (gene: gercB or gerC2), a 
probable menaquinone biosynthesis methlytransferase. - Lactococcus lactis gerC2 homolog. - 
Caenorhabditis elegans hypothetical protein ZK652.9. - Leishmania donovani amastigote- 
specific protein A41. These are hydrophilic proteins of about 30 Kd (except for ZK652.9 
which is 65Kd). They can be picked up in the database by the following patterns. 

2 5 Consensus pattern: Y-D-x-M-N-x(2Ht.WM 

Consensus pattern: R-V-fMS^M}ri.JVM SEP ID NO:4)]-K4PVl-G-G-x-fi^^ 

SEC ID N O:2V|-x(2VI--M¥^fLIVM SEP I D NO :4Y[-E-x-S 

[ 1] Lee P.T., Hsu A.Y., Ha H.T., Clarke C.F. J. Bacteriol. 179:1748-1754(1997). 

30 

707. Uricase signature 

Uricase (urate oxidase) [1] is the peroxisomal enzyme responsible for the degradation of 
urate into allantoin. Some species, like primates and birds, have lost the gene for uricase and 
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are therefore unable to degradeurate. Uricase is a protein of 300 to 400 amino acids. A highly 
conserved region located in the central part of the sequence has been used as a signature 
pattern. 

Consensus pattern: [LV]-x-[LV]-[LIV]-K-[STV]-[ST]-x-[SN]-x-F-x(2)-[FY]-x(4)- [FY]- 
5 x(2)-L-x(5)-R 

[ 1] Motojima K., Kanaya S., Goto S. J. Biol. Chem. 263:16677-16681(1988). 

708. Universal stress protein family (Usp) 
10 By a wide range of stress conditions members of the Usp family are predicted to be 

related to the MADS-box proteins transcript fact and bind to DNA [2]. Number of members: 
39 

[1] Expression and role of the universal stress protein, UspA, of Escherichia coli during 
1 5 growth arrest. Nystrom T, Neidhardt FC; Mol Microbiol 1994; 1 1 :537-544. 

[2] Sequence analysis of eukaryotic developmental proteins: ancient and novel domains. 
Mushegian AR, Koonin EV; Genetics 1996; 144:817-828. 

20 709. Ubiquitin domain signature and profile 

Ubiquitin [1,2,3] is a protein of seventy six amino acid residues, found in all eukaryotic cells 
and whose sequence is extremely well conserved from protozoan to vertebrates. It plays a key 
role in a variety of cellular processes, such as ATP-dependent selective degradation of 
cellular proteins,maintenance of chromatin structure, regulation of gene expression, stress 

2 5 response and ribosome biogenesis. In most species, there are many genes coding for 

ubiquitin. However they can be classified into two classes. The first class produces 
polyubiquitin molecules consisting of exact head to tail repeats of ubiquitin. The number of 
repeats is variable (up to twelve in a Xenopus gene). In the majority of polyubiquitin 
precursors, there is a final amino-acid after the last repeat. The second class of genes 

3 0 produces precursor proteins consisting of a single copy of ubiquitin fused to a C-terminal 

extension protein (CEP). There are two types of CEP proteins and both seem to be ribosomal 
proteins. Ubiquitin is a globular protein, the last four C-terminal residues (Leu-Arg- Gly-Gly) 
extending from the compact structure to form a 'tail*, important for its function. The latter is 
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mediated by the covalent conjugation of ubiquitin to target proteins, by an isopeptide linkage 
between the C-terminal glycine and the epsilon amino group of lysine residues in the target 
proteins. There are a number of proteins which are evolutionary related to ubiquitin: - 
Ubiquitin-like proteins from baculoviruses as well as in some strains of bovine viral diarrhea 
5 viruses (BVDV). These proteins are highly similar to their eukaryotic counterparts. - 

Mammalian protein GDX [4], GDX is composed of two domains, a N-terminal ubiquitin-like 
domain of 74 residues and a C-terminal domain of 83 residues with some similarity with the 
thyroglobulin hormonogenic site. - Mammalian protein FAU [5]. FAU is a fusion protein 
which consist of a N-terminal ubiquitin-like protein of 74 residues fused to ribosomal protein 

1 0 S30. - Mouse protein NEDD-8 [6], a ubiquitin-like protein of 81 residues. - Human protein 
BAT3, a large fusion protein of 1 132 residues that contains a N-terminal ubiquitin-like 
domain. - Caenorhabditis elegans protein ubl-1 [7]. Ubl-1 is a fusion protein which consist of 
a N-terminal ubiquitin-like protein of 70 residues fused to ribosomal protein S27A. - Yeast 
DNA repair protein RAD23 [8]. RAD23 contains a N-terminal domain that seems to be 

15 distantly, yet significantly, related to ubiquitin. - Mammalian RAD23-related proteins 

RAD23A and RAD23B. - Mammalian BCL-2 binding athanogene-1 (BAG-1). BAG-1 is a 
protein of 274 residues that contains a central ubiquitin-like domain. - Human spliceosome 
associated protein 1 14 (SAP 1 14 or SF3A120). - Yeast protein DSK2, a protein involved in 
spindle pole body duplication and which contains a N-terminal ubiquitin-like domain. - 

2 0 Human protein CKAP1/TFCB, Schizosaccharomyces pombe protein alpl 1 and 

Caenorhabditis elegans hypothetical protein F53F4.3. These proteins contain a N-terminal 
ubiquitin domain and a C-terminal CAP-Gly domain. - Schizosaccharomyces pombe 
hypothetical protein SpAC26A3.16. This protein contains a N-terminal ubiquitin domain. - 
Yeast protein SMT3. - Human ubiquitin-like proteins SMT3A and SMT3B. - Human 

2 5 ubiquitin-like protein SMT3C (also known as PIC1; Ubll, Sumo-1; Gmp-1 or Sentrin). This 

protein is involved in targeting ranGAPl to the nuclear pore complex protein ranBP2. - 
SMT3-like proteins in plants and Caenorhabditis elegans. To identify ubiquitin and related 
proteins, a pattern has been developed based on conserved positions in the central section of 
the sequence. A profile was also developed that spans the complete length of the ubiquitin 

3 0 domain. 

Consensus pattern: K-x(2)-fU¥M|{LlVM SEP ID NP:4yj-x4DESAK}| DESA K SEP ID 
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NO:4)I- [HVMQ]{LIVMC.MQ.ro.NO:I4 
x(4)-[DE] 

[ 1] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:127-139(1991).[ 2] 
Monia B.P., Ecker D.J., Croke ST. Bio/Technology 8:209-215(1990).[ 3] Finley D., 
5 Varshavsky A. Trends Biochem. Sci. 10:343-347(1985).[ 4] Filippi M., Tribioli C, Toniolo 
D. Genomics 7:453-457(1990).[ 5] Olvera J., Wool I.G. J. Biol. Chem. 268:17967- 
1 7974(1 993).[ 6] Kumar S., Yoshida Y., Noda M. Biochem. Biophys. Res. Commun. 
195:393-399(1993).[ 7] Jones D., Candido E.P. J. Biol. Chem. 268:19545-19551(1993).[ 8] 
Melnick L., Sherman F. J. Mol. Biol. 233:372-388(1993). 

10 

710. VHS domain 

Domain present in VPS-27, Hrs and STAM. Number of members: 27 

15 

711. Vinculin family signatures 

Vinculin [1] is a eukaryotic protein that seems to be involved in the attachment of the actin- 
based microfilaments to the plasma membrane. Vinculinis located at the cytoplasmic side of 
focal contacts or adhesion plaques. In addition to actin, vinculin interacts with other structural 

2 0 proteins such as talin and alpha-act inins. Vinculin is a large protein of 1 16 Kd (about a 1000 
residues). Structurally the protein consists of an acidic N-terminal domain of about 90 Kd 
separated from a basic C-terminal domain of about 25 Kd by a proline-rich region of about 50 
residues. The central part of the N-terminal domain consists of avariable number (3 in 
vertebrates, 2 in Caenorhabditis elegans) of repeats of a 1 10 amino acids domain. Catenins 

2 5 [2] are proteins that associate with the cytoplasmic domain of avariety of cadherins. The 

association of catenins to cadherins produces a complex which is linked to the actin filament 
network, and which seems to be of primary importance for cadherins cell-adhesion 
properties. Three different types of catenins seem to exist: alpha, beta, and gamma. Alpha- 
catenins are proteins of about 100 Kd which are evolutionary related to vinculin. Interm of 

30 their structure the most significant differences are the absence, inalpha-catenin, of the 

repeated domain and of the proline-rich segment. Two signature patterns for this family of 
proteins have been devolped. The first pattern is located in the N-terminal section of both 
vinculin and alpha-catenins and is part, in vinculin, of a domain that seems to be involved 
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with the interaction with talin. The second pattern is based on a conserved regionin the N- 
terminal part of the repeated domain of vinculin. 

Consensus pattern: \KR]-x-lhWM¥ : \ \UVMF S EP ID NO:2)1-x(3)4fcj-VMA4jiJVM.A SEP 
lDNO:30j]-x(2HM^ 
5 Consensus pattern: £M¥M];^ 

[ 1] Otto JJ. Cell MotiL Cytoskeleton 16:1-6(1990).[ 2] Herrenknecht K., Ozawa M., 
Eckerskorn C, Lottspeich F., Lenter M., Kemler R. Proc. Natl. Acad. Sci. U.S.A. 88:9156- 
9160(1991). 

10 

712. (Vitellogenin N) Lipoprotein amino terminal region 

This family contains regions from: Vitellogenin, Microsomal triglyceride transfer 
protein and apolipoprotein B-100. These proteins are all involved in lipid transport [1]. This 
family contains the LVln chain from lipovitellin, that contains two structural domains. 
1 5 Number of members: 33 

[1] The structural basis of lipid interactions in lipovitellin, a soluble lipoprotein. 
Anderson TA ? Levitt DG, Banaszak LJ Structure 1998;6:895-909. 

2 0 713. (VMS A) Major surface antigen from hepadnavirus 

714. ssDNA binding protein (Viral DNA bp) 

This protein is found in herpesviruses and is needed for 
2 5 replication. 

715. (Votage CLC) Voltage gated chloride channels 

30 This family of ion channels contains 10 or 12 transmembrane helices. Each protein forms a 
single pore. It has been shown that some members of this family form homodimers. These 
proteins contain two CBS domains. 
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[1] Schmidt-Rose T, Jentsch TJ; J Biol Chem 1997;272:20515-20521. 

[2] Zhang J, George AL Jr, Griggs RC, Fouad GT, Roberts J, Kwiecinski H, Connolly AM, 

Ptacek LJ; Neurology 1996;47:993-998. 

5 

716. von Willebrand factor type A domain (vwa) 
More von Willebrand factor type A domains? Sequence 
similarities with malaria thrombospondin-related 
anonymous protein, dihydropyridine-sensitive calcium 
1 0 channel and inter-alpha-trypsin inhibitor. 
Bork P, Rohde K; 
Biochem J 1991;279:908-911. 

1 . RUGGERI, Z.M. and WARE, J. 
1 5 von Willebrand factor. 

FASEB J. 7 308-316(1993). 

2. COLOMBATTI, A., BONALDO, P. and DOLIANA, R. 

Type A modules: interacting domains found in several non-fibrillar 
2 0 collagens and in other extracellular matrix proteins. 

MATRIX 13 297-306 (1993). 

3. PERKINS, S.J., SMITH, K.F., WILLIAMS, S.C., HARIS, P.L, CHAPMAN, D. 
and SIM, R.B. 

2 5 The secondary structure of the von Willebrand factor type A domain in 

factor B of human complement by Fourier transform infrared spectroscopy. 
Its occurrence in collagen types VI, VII, XII and XIV, the integrins and 
other proteins by averaged structure predictions. 
J.MOL.BIOL. 238 104-119 (1994). 

30 

4. BORK, P. and ROHDE, K. 

More von Willebrand factor type A domains? Sequence similarities with 
malaria thrombospondin-related anonymous protein, dihydropyridine- 
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sensitive calcium channel and inter-alpha-trypsin inhibitor. 
BIOCHEMJ. 279 908-910 (1991). 

5. EDWARDS, Y.J.K. and PERKINS, S.J. 

The protein fold of the von Willebrand factor type A domain is predicted 
to be similar to the open twisted beta-sheet flanked by alpha-helices 
found in human ras-p2 1 . 
FEBS LETT. 358 283-286 (1995). 

6. LEE, J.O., RIEU, P., ARNAOUT, M.A. and LIDDINGTON, R. 
Crystal structure of the A domain from the alpha subunit of integrin CR3 
(CD1 lb/CD 18). 

CELL 80 631-638 (1995). 

7. QU, A. and LEAHY, D.J. 

Crystal structure of the I-domain from the CD 1 la/CD 18 (LFA-1, 
alpha L beta 2) integrin. 

PROC.NATL.ACAD.SCLUSA 92 10277-10281 (1995). 

The von Willebrand factor is a large multimeric glycoprotein found in blood 
plasma. Mutant forms are involved in the aetiology of bleeding disorders 
[1]. In von Willebrand factor, the type A domain (vWF) is the prototype for 
a protein superfamily. The vWF domain is found in various plasma proteins: 
complement factors B, C2, CR3 and CR4; the integrins (I-domains); collagen 
types VI, VII, XII and XIV; and other extracellular proteins [2-4]. Proteins 
that incorporate vWF domains participate in numerous biological events 
(e.g., cell adhesion, migration, homing, pattern formation, and signal 
transduction), involving interaction with a large array of ligands [2]. 
Secondary structure prediction from 75 aligned vWF sequences has revealed 
a largely alternating sequence of alpha-helices and beta-strands [3]. Fold 
recognition algorithms were used to score sequence compatibility with a 
library of known structures: the vWF domain fold was predicted to be a 
doubly- wound, open, twisted beta-sheet flanked by alpha-helices [5]. 
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3D structures have been determined for the I-domains of integrins CD1 lb 
(with bound magnesium) [6] and CD1 la (with bound manganese) [7], The domain 
adopts a classic alpha/beta Rossmann fold and contains an unusual metal 
ion coordination site at its surface. It has been suggested that this site 
5 represents a general metal ion-dependent adhesion site (MIDAS) for binding 
protein ligands [6]. The residues constituting the MIDAS motif in the CD1 lb 
and CD1 la I-domains are completely conserved, but the manner in which the 
metal ion is coordinated differs slightly [7]. 

1 0 VWFADOMAIN is a 3-element fingerprint that provides a signature for the vWF 
domain superfamily. The fingerprint was derived from an initial alignment 
of 14 sequences. Motif 1 includes the first beta-strand and 3 conserved 
residues involved in metal ion coordination in I-domains (Asp and 2 serines 
in positions 8, 10 and 12, respectively); motif 2 spans strands beta-2 and 

1 5 beta-2 1 ; and motif 3 encodes beta-strand 3 and a conserved Asp (in position 
7), which coordinates the metal ion [6,7]. Three iterations on OWL27.0 were 
required to reach convergence, at which point a true set comprising 56 
sequences was identified. Numerous partial matches were also found. 

20 

717. (WD40) WD domain, G-beta repeat 

The ancient regulatory-protein family of WD-repeat proteins. 

Neer EJ, Schmidt CJ, Nambudripad R, Smith TF; 

Nature 1994;371:297-300. 

2 5 Beta-transducin (G-beta) is one of the three subunits (alpha, beta, and gamma) 

of the guanine nucleotide-binding proteins (G proteins) which act as 
intermediaries in the transduction of signals generated by transmembrane 
receptors [1]. The alpha subunit binds to and hydrolyzes GTP; the functions of 
the beta and gamma subunits are less clear but they seem to be required for 

3 0 the replacement of GDP by GTP as well as for membrane anchoring and 

receptor recognition. 

In higher eukaryotes G-beta exists as a small multigene family of highly 
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conserved proteins of about 340 amino acid residues. Structurally G-beta 
consists of eight tandem repeats of about 40 residues, each containing a 
central Trp-Asp motif (this type of repeat is sometimes called a WD-40 
repeat). Such a repetitive segment has been shown [El, 2,3,4,5] to exist in a 
5 number of other proteins listed below: 

- Yeast STE4, a component of the pheromone response pathway. STE4 is a G-beta 
like protein that associates with GPA1 (G-alpha) and STE18 (G-gamma). 

- Yeast MSI1, a negative regulator of RAS-mediated cAMP synthesis. MSI1 is 
1 0 most probably also a G-beta protein. 

- Human and chicken protein 12.3. The function of this protein is not known, 
but on the basis of its similarity to G-beta proteins, it may also function 

in signal transduction. 

15 - Chlamydomonas reinhardtii gblp. This protein is most probably the homolog 
of vertebrate protein 12.3. 

- Human LIS1, a neuronal protein involved in type-1 lissencephaly [E2]. 

- Mammalian coatomer beta' subunit (beta'-COP), a component of a cytosolic 
protein complex that reversibly associates with Golgi membranes to form 

2 0 vesicles that mediate biosynthetic protein transport. 

- Yeast CDC4, essential for initiation of DNA replication and separation of 
the spindle pole bodies to form the poles of the mitotic spindle. 

- Yeast CDC20, a protein required for two microtubule-dependent processes: 

2 5 nuclear movements prior to anaphase and chromosome separation. 

- Yeast MAK1 1, essential for cell growth and for the replication of Ml 
double-stranded RNA. 

- Yeast PRP4, a component of the U4/U6 small nuclear ribonucleoprotein with 
a probable role in mRNA splicing. 

3 0 - Yeast P WP 1 , a protein of unknown function. 

- Yeast SKI8, a protein essential for controlling the propagation of double- 
stranded RNA. 

- Yeast SOF1, a protein required for ribosomal RNA processing which 
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associates with U3 small nucleolar RNA. 

- Yeast TUP1 (also known as AER2 or SFL2 or CYC9), a protein which has been 
implicated in dTMP uptake, catabolite repression, mating sterility, and 

many other phenotypes. 
5 - Yeast YCR57c, an ORF of unknown function from chromosome III. 

- Yeast YCR72c, an ORF of unknown function from chromosome III. 

- Slime mold coronin, an ac tin-binding protein. 

- Slime mold AAC3, a developmentally regulated protein of unknown function. 

10 

- Drosophila protein Groucho (formerly known as E(spl); 'enhancer of split'), 
a protein involved in neurogenesis and that seems to interact with the 
Notch and Delta proteins. 

- Drosophila TAF-II-80, a protein that is tightly associated with TFIID. 

15 

The number of repeats in the above proteins varies between 5 (PRP4, TUP1, and 
Groucho) and 8 (G-beta, STE4, MSI1, AAC3, CDC4, PWP1, etc.). In G-beta and G- 
beta like proteins, the repeats span the entire length of the sequence, while 
in other proteins, they make up the N-terminal, the central or the C-terminal 
2 0 section. 

A signature pattern can be developed from the central core of the domain 
(positions 9 to 23). 

2 5 -Consensus pattern: f 

P^N4F ^ r WSTAG C l |'i.JVMFYWSTAGC SEP ID NO:635)1 4I . J.NfSTAGl jLIMSTAG SEP 
IDNQ:636]G-^ 
x(2M!AVMW 

30 

[ 1] Gilman A.G. 

Annu. Rev. Biochem. 56:615-649(1987). 
[ 2] Duronio R.J., Gordon J. I., Boguski M.S. 
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Proteins 13:41-56(1992). 
[ 3] van der Voorn L., Ploegh H.L 

FEBS Lett. 307:131-134(1992). 
[ 4] Neer E.J., Schmidt C.J., Nambudripad R., Smith T.F. 
5 Nature 371:297-300(1994). 

[ 5] Smith T.F., Gaiatzes C.G., Saxena K., Neer E.J. 

Biochemistry In Press(1998). 

10 718. WHEP-TRS domain containing proteins 

A conserved domain of 46 amino acids has been shown [1] to exist in a number 
of higher eukaryote aminoacyl-transfer RNA synthetases. This domain is present 
one to six times in the following enzymes: 

15 - Mammalian multifunctional aminoacyl-tRNA synthetase. The domain is present 
three times in a region that separates the N-terminal glutamyl-tRNA 
synthetase domain from the C-terminal prolyl-tRNA synthetase domain. 

- Drosophila multifunctional aminoacyl-tRNA synthetase. The domain is present 
six times in the intercatalytic region. 

2 0 - Mammalian tryptophanyl-tRNA synthetase. The domain is found at the N- 
terminal extremity. 

- Mammalian, insect, nematode and plant glycyl-tRNA synthetase. The domain is 
found at the N-terminal extremity [2]. 

- Mammalian histidyl-tRNA synthetase. The domain is found at the N-terminal 
2 5 extremity. 

This domain, which is called WHEP-TRS, could contain a central alpha-helical 
region and may play a role in the association of tRNA-synthetases into 
multienzyme complexes. 

30 

A signature pattern based on the first 29 positions of the WHEP- 
Domain has been developed. 
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-Consensus pattern: [QY]-G-[DNEA]-[D^ 
fKRNG4iKRNG SEP ID NO:642Vj -[AS]-x(4)- 
rLIVI-TON^ IDENK SEP ID NO:643)^ x(;2V[IV]-x(2)-L-x(3)-K 

5 [ 1] Cerini C, Kerjan P., Astier M., Gratecos D., Mirande M, Semeriva M. 
EMBO J. 10:4267-4277(1991). 
[ 2] Nada S., Chang P.K., Dignam J.D. 
J. Biol. Chem. 268:7660-7667(1993). 

10 

719. (Worm family 8) Putative membrane protein 

Analysis of protein domain families in Caenorhabditis elegans. 
Sonnhammer EL, Durbin R; 
Genomics 1997;46:200-216. 
15 This family called family 8 in [1], may be a transmembrane protein 
The specific function of this protein is unknown. 

720. Xylose isomerase 

2 0 Xylose isomerase (EC 5.3.1.5) [1] is an enzyme found in microorganisms which 
catalyzes the interconversion of D-xylose to D-xylulose. It can also isomerize 
D-ribose to D-ribulose and D-glucose to D-fructose. Xylose isomerase seems to 
require magnesium for its activity, while cobalt is necessary to stabilize the 
tetrameric structure of the enzyme. A number of residues are conserved in all 

2 5 known xylose isomerases. 

Xylose isomerase also exists in plants [2] where it is homodimeric and is 
manganese-dependent. 

3 0 Two signatures patterns for xylose isomerase have been developed. The first one is 

derived from a stretch of five conserved amino acids that includes a glutamic 
acid residue known to be one of the four residues involved in the binding of 
the magnesium ion [3]; this pattern also includes a lysine residue which is 
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involved in the catalytic activity. The second pattern is derived from a 
conserved region in the N-terminal section of the enzyme that include an 
histidine residue which has been shown [4] to be involved in the catalytic 
mechanism of the enzyme. 

5 

-Consensus pattern: [LI]-E-P-K-P-x(2)-P 
[E is a magnesium ligand] 
[K is an active site residue] 
-Consensus pattern: [FL]-H-D-x-D-[LIV]-x-[PD]-x-[GDE] 
10 [H is an active site residue] 

[ 1] Dauter Z., Dauter M. 5 Hemker J., Witzel H., Wilson K.S. 

FEBS Lett. 247:1-8(1989). 
[ 2] Kristo P. A., Saarelainen R., Fagerstrom R., Aho S., Korhola M 
15 Eur. J. Biochem. 237:240-246(1996). 

[ 3] Henrick K., Collyer C.A., Blow D.M. 

J. Mol. Biol. 208:129-157(1989). 
[ 4] Vangrysperre W., Ampe C, Kersters-Hilderson H., Tempst P. 

Biochem. J. 263:195-199(1989). 

20 

721 . XPG protein signatures. Xeroderma pigmentosum (XP) [1] is a human autosomal 
recessive disease, characterized by a high incidence of sunlight-induced skin cancer. People's 
skin cells with this condition are hypersensitive to ultraviolet light, due to defects in the 

2 5 incision step of DN A excision repair. There are a minimum of seven genetic 

complementation groups involved in this pathway: XP-A to XP-G. The defect in XP-G can 
be corrected by a 133 Kd nuclear protein called XPG (or XPGC) [2]. XPG belongs to a family 
of proteins [2,3,4,5,6] that are composed of twomain subsets: - Subset 1, to which belongs 
XPG, RAD2 from budding yeast and radl3 from fission yeast. RAD2 and XPG are single- 

30 stranded DNA endonucleases [7,8]. XPG makes the 3'incision in human DNA nucleotide 

excision repair [9]. - Subset 2, to which belongs mouse and human FEN-1, rad2 from fission 
yeast, and RAD27 from budding yeast. FEN-1 is a structure-specific endonuclease. In 
addition to the proteins listed in the above groups, this family also includes: - Fission yeast 
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exol, a 5'->3' double-stranded DNA exonuclease that could act in a pathway that corrects 
mismatched base pairs. - Yeast EXOl (DHS1), a protein with probably the same function as 
exol . - Yeast DIN7. Sequence alignment of this family of proteins reveals that similarities are 
largely confined to two regions. The first is located at the N-terminal extremity (N-region) 
5 and corresponds to the first 95 to 105 amino acids. The second region is internal (I-region) 
and found towards the C-terminus; it spans about 140 residues and contains a highly 
conserved core of 27 amino acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). 
It is possible that the conserved acidic residues are involved in the catalytic mechanism of 
DNA excision repair in XPG. The amino acids linking the N- and I-regions are not 
1 0 conserved; indeed, they are largely absent from proteins belonging to the second subset. Two 
signature patterns have been developed for these proteins. The first corresponds to the central 
part of the N-region, the second to part of the I-region and includes the putative catalytic core 
pentapeptide 

1 5 Consensus pattern: [VI]-[KRE]-P-x-ff¥l-Ll[FYIL SEP ID NQ :644 )]-V-F-D-G-x(2)-[PIL]-x- 
[LVC]-K- 

Consensus pattern: [GS]-{W¥M}|I JVM 

ID NO;4 )] -x-A-P-x-E-A-[DE]-[PAS]- [QS]-[CLM]- 

20 [1] Tanaka K., Wood R.D. Trends Biochem. Sci. 19:83-86(1994).[ 2] Scherly D., Nouspikel 
T., Corlet J., Ucla C, Bairoch A., Clarkson S.G. Nature 363:182-185(1993).[ 3] Carr A.M., 
Sheldrick K.S., Murray J.M., Al-Harithy R., Watts F.Z., Lehmann A.R. Nucleic Acids Res. 
21:1345-1349(1993).[ 4] Murray J.M., Tavassoli M., Al-Harithy R., Sheldrick K.S., 
Lehmann A.R., Carr A.M., Watts F.Z. Mol. Cell. Biol. 14:4878-4888(1994).[ 5] Harrington 

25 J.J., Lieber M.R. Genes Dev. 8:1344-1355(1994).[ 6] Szankasi P., Smith G.R. Science 

267:1 166-1 169(1995).[ 7] Habraken Y., Sung P., Prakash L., Prakash S. Nature 366:365- 
368(1993).[ 8] O'Donovan A., Scherly D., Clarkson S.G., Wood R.D. J. Biol. Chem. 
269:15965-15968(1994).[ 9] O'Donovan A., Davies A.A., Moggs J.G., West S.C., Wood 
R.D. Nature 371:432-435(1994). 

30 



722. Xanthine/uracil permeases family 

The following transport proteins which are involved in the uptake of xanthine 
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or uracil are evolutionary related [1]: 

- Uric uric acid-xanthine permease (gene uapA) from Aspergillus nidulans. 

- Purine permease (gene uapC) from Aspergillus nidulans. 
5 - Xanthine permease from Bacillus subtilis (gene pbuX). 

- Uracil permease from Escherichia coli (gene uraA) [2] and Bacillus (gene 
pyrP). 

- Hypothetical protein ycdG from Escherichia coli. 

- Hypothetical protein ygfO from Escherichia coli. 
10 - Hypothetical protein ygfU from Escherichia coli. 

- Hypothetical protein yicE from Escherichia coli. 

- Hypothetical protein yunJ from Bacillus subtilis. 

- Hypothetical protein yunK from Bacillus subtilis. 

15 They are proteins of from 430 to 595 residues that seem to contain 12 
transmembrane domains. 

The best conserved region which corresponds with what seems to 

be the tenth transmembrane domain has been selected as a signature pattern. 

2 0 -Consensus pattern: {;fciVM]|LlVM 
V-[l..; iVM ][UVM..SEQJD 
f-UVMillLIVM SEQ lDNO:4Yj-x(3VG 

[ 1] Diallinas G., Gorfinkiel L. ? Arst G., Cecchetto G. ? Scazzocchio C. 
J. Biol. Chem. 270:8610-8622(1995). 

2 5 [2] Andersen P.S., Frees D., Fast R., Mygind B. 

J. Bacteriol. 177:2008-2013(1995). 

723. Hypothetical yabO/yceC/sfhB family 

3 0 The following proteins, which seems to belong to a family of pseudouridine 

synthases (EC 4.2.1.70) [1] have been shown to share regions of similarities: 

- Escherichia coli and Haemophilus influenzae ribosomal large subunit 
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pseudouridine synthase A (gene rluA). It is responsible for synthesis of 
pseudouridine from uracil-746 IN 23 S rRNA. 

- Escherichia coli and Haemophilus influenzae ribosomal large subunit 
pseudouridine synthase C (gene rluC). It is responsible for synthesis of 

5 pseudouridine from uracil at positions 955, 2504 and 2580 in 23S rRNA. 

- Escherichia coli protein and homologs in other bacteria large subunit 
pseudouridine synthase D (gene rluD). 

- Yeast DRAP deaminase (gene RIB2). 

-Escherichia coli hypothetical protein yqcB and HI 1435, the corresponding 
1 0 Haemophilus influenzae protein. 

- Haemophilus influenzae hypothetical protein HI0042. 

- Aquifex aeolicus hypothetical protein AQ 1758. 

- Bacillus subtilis hypothetical protein yhcT. 

- Bacillus subtilis hypothetical protein yjbO. 
15 - Bacillus subtilis hypothetical protein ylyB. 

- Helicobacter pylori hypothetical protein HP0347. 

- Helicobacter pylori hypothetical protein HP0745. 

- Helicobacter pylori hypothetical protein HP0956. 

- Mycoplasma genitalium hypothetical protein MG209. 
20 - Mycoplasma genitalium hypothetical protein MG370. 

- Synechocystis strain PCC 6803 hypothetical protein slrl592. 

- Synechocystis strain PCC 6803 hypothetical protein slrl629. 

- Yeast hypothetical protein YDL036c. 

- Yeast hypothetical protein YGR169c. 

2 5 - Fission yeast hypothetical protein SpAC 1 8B 1 1 .02c. 

- Caenorhabditis elegans hypothetical protein K07E8.7. 

These are proteins of from 21 to 50 Kd which contain a number of conserved 
regions in their central section. They can be picked up in the database by the 
30 following highly conserved pattern. 

-Consensus pattern: {«V-GAi[U^^ 
NO:647}j-R-[LI]-D-x^ 
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1' SGTAC V 1 1'SGTAC V SEP TP NO:650)] 

[ 1] Conrad J. ? Sun D. ? Englund N. ? Ofengand J. 
5 J. Biol. Chem. 273:18562-18566(1998). 

In addition, the following bacterial proteins, which seems to belong to a family of 
pseudouridine synthases (EC 4.2.1.70) [1] also have been shown to share regions of 
similarities: 

10 

-Escherichia coli and Haemophilus influenzae 16S pseudouridylate 516 
synthase (EC 4.2.1.70) (gene: rsuA). This enzyme is responsible for the 
formation of pseudouridine from uracil-516 in 16S ribosomal RNA. 

- Escherichia coli hypothetical protein yciL and HI 1 199, the corresponding 
15 Haemophilus influenzae protein. 

- Escherichia coli hypothetical protein yjbC. 

- Escherichia coli hypothetical protein ymfC and HI0694, the corresponding 
Haemophilus influenzae protein. 

- Aquifex aeolicus hypothetical protein AQ_554. 

2 0 - Aquifex aeolicus hypothetical protein AQ 1464. 

- Bacillus subtilis hypothetical protein ypuL. 

- Bacillus subtilis hypothetical protein ytzF. 

- Borrelia burgdorferi hypothetical protein BB0129. 

- Helicobacter pylori hypothetical protein HP 1459. 

25 - Synechocystis strain PCC 6803 hypothetical protein slr0361. 

- Synechocystis strain PCC 6803 hypothetical protein slr0612. 

These are proteins of from 25 to 40 Kd which contain a number of conserved 
regions in their central section. They can be picked up in the database by the 

3 0 following highly conserved pattern. 

-Consensus pattern: G-R-L-D-x(2)-[STA]-x-G-fM-V^ 
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[ 1] Wrzesinski J., Bakin A., Nurse K., Lane B.G., Ofengand J. 
Biochemistry 34:8904-8913(1995). 

5 

724. Zinc finger present in dystrophin, CBP/p300 
ZZ in dystrophin binds calmodulin 

Putative zinc finger; binding not yet shown. 

10 

725. Zinc carboxypeptidase 

There are a number of different types of zinc-dependent carboxypeptidases (EC 
3.4.17.-) [1,2]. All these enzymes seem to be structurally and functionally 
related. The enzymes that belong to this family are listed below. 

15 

- Carboxypeptidase Al (EC 3.4.17.1), a pancreatic digestive enzyme that can 
removes all C-terminal amino acids with the exception of Arg, Lys and Pro. 

- Carboxypeptidase A2 (EC 3.4.17.15), a pancreatic digestive enzyme with a 
specificity similar to that of carboxypeptidase Al, but with a preference 

2 0 for bulkier C-terminal residues. 

- Carboxypeptidase B (EC 3.4.17.2), also a pancreatic digestive enzyme, but 
that preferentially removes C-terminal Arg and Lys. 

- Carboxypeptidase N (EC 3.4.17.3) (also known as arginine carboxypeptidase), 
a plasma enzyme which protects the body from potent vasoactive and 

2 5 inflammatory peptides containing C-terminal Arg or Lys (such as kinins or 

anaphylatoxins) which are released into the circulation. 

- Carboxypeptidase H (EC 3.4.17.10) (also known as enkephalin convertase or 
carboxypeptidase E), an enzyme located in secretory granules of pancreatic 
islets, adrenal gland, pituitary and brain. This enzyme removes residual C- 

3 0 terminal Arg or Lys remaining after initial endoprotease cleavage during 

prohormone processing. 

- Carboxypeptidase M (EC 3.4.17.12), a membrane bound Arg and Lys specific 
enzyme. 
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It is ideally situated to act on peptide hormones at local tissue sites 
where it could control their activity before or after interaction with 
specific plasma membrane receptors. 

- Mast cell carboxypeptidase (EC 3.4.17.1), an enzyme with a specificity 
5 to carboxypeptidase A, but found in the secretory granules of mast cells. 

- Streptomyces griseus carboxypeptidase (Cpase SG) (EC 3.4.17.-) [3], which 
combines the specificities of mammalian carboxypeptidases A and B. 

- Thermoactinomyces vulgaris carboxypeptidase T (EC 3.4.17.18) (CPT) [4], 
which also combines the specificities of carboxypeptidases A and B. 

10 - AEBP1 [5], a transcriptional repressor active in preadipocytes. AEBP1 seems 
to regulate transcription by cleavage of other transcriptional proteins. 

- Yeast hypothetical protein YHR132c. 

All of these enzymes bind an atom of zinc. Three conserved residues are 
15 implicated in the binding of the zinc atom: two histidines and a glutamic acid 

Two signature patterns which contain these three zinc-ligands have been derived. 

-Consensus pattern: rPK]-x-fLWMF¥ir UVMFY SEP ID NO: 18)] -x-ffct¥MF¥4 fUVMFY 
2 0 NO:4}]- 

[H and E are zinc ligands] 
-Consensus pattern: H-fSTAG}[ STA G SE C ID NO:20) ]-xQ)-fLP^Mfij!' LrVME SEP ID 
NQ;6!^-x(2H^ 

2 5 [H is a zinc ligand] 

[ 1] Tan F., Chan S.J., Steiner D.F., Schilling J.W., Skidgel R.A. 
J. Biol. Chem. 264:13165-13170(1989). 
[ 2] Reynolds D.S., Stevens R.L., Gurley D.S., Lane W.S., Austen K.F., 

3 0 Serafin W.E. 

J. Biol. Chem. 264:20094-20099(1989). 
[ 3] Narahashi Y. 
J. Biochem. 107:879-886(1990). 
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[ 4] Teplyakov A. 5 Polyakov K. ? Obmolova G. ? Strokopytov B., Kuranova I., 

Osterman A.L., Grishin N.V., Smulevitch S.V., Zagnitko O.P., 

Galperina O.V., Matz M.V., Stepanov V.M. 

Eur. J. Biochem. 208:281-288(1992). 
5 [ 5] He G.-P., Muise A., Li A.W., Ro H.-S. 

Nature 378:92-96(1995). 
[ 6] Hourdou M.-L., Guinand M., Vacheron M.J., Michel G., Denoroy L., 

Duez C.M., Englebert S., Joris B., Weber G., Ghuysen J.-M. 

Biochem. J. 292:563-570(1993). 
10 [7] Rawlings N.D., Barrett A.J. 

Meth. Enzymol. 248:183-228(1995). 

726. Zinc finger, C2H2 type 
1 5 The C2H2 zinc finger is the classical zinc finger domain. 

The two conserved cysteines and histidines co-ordinate a 

zinc ion. The following pattern describes the zinc finger. 

#-X-C-X(l-5)-C-X3-#-X5-#-X2-H-X(3-6)-[H/C] 

Where X can be any amino acid, and numbers in brackets 
2 0 indicate the number of residues. The positions marked # are 

those that are important for the stable fold of the zinc 

finger. The final position can be either his or cys. 

The C2H2 zinc finger is composed of two short beta strands 

followed by an alpha helix. The amino terminal part of the 
2 5 helix binds the major groove in DNA binding zinc fingers. 

'Zinc finger' domains [1-5] are nucleic acid-binding protein structures first 
identified in the Xenopus transcription factor TFIIIA. These domains have 
since been found in numerous nucleic acid-binding proteins. A zinc finger 
30 domain is composed of 25 to 30 amino-acid residues. There are two cysteine or 
histidine residues at both extremities of the domain, which are involved in 
the tetrahedral coordination of a zinc atom. It has been proposed that such a 
domain interacts with about five nucleotides. A schematic representation of a 
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zinc finger domain is shown below: 
x x 

X X 
X X 
X X 
X X 
X X 

C H 

X \ / X 

x Zn x 
X / \ x 
C H 

XXXXX XXXXX 

Many classes of zinc fingers are characterized according to the number and 
positions of the histidine and cysteine residues involved in the zinc atom 
coordination. In the first class to be characterized, called C2H2, the first 
pair of zinc coordinating residues are cysteines, while the second pair are 
histidines. A number of experimental reports have demonstrated the zinc- 
dependent DNA or RNA binding property of some members of this class. 

Some of the proteins known to include C2H2-type zinc fingers are listed below. 
The number of zinc finger regions found in each of these proteins are indicated 
between brackets; a *+' symbol indicates that only partial sequence 
data is available and that additional finger domains may be present. 

- Saccharomyces cerevisiae: ACE2 (3), ADR1 (2), AZF1 (4), FZF1 (5), MIG1 (2), 
MSN2 (2), MSN4 (2), RGM1 (2), RIM1 (3), RME1 (3), SFP1 (2), SSL1 (1), 
STP1 (3), SWI5 (3), VAC1 (1) and ZMS1 (2). 

- Emericella nidulans: brlA (2), creA (2). 

- Drosophila: AEF-1 (4), Cf2 (7), ci-D (5), Disconnected (2), Escargot (5), 
Glass (5), Hunchback (6), Kruppel (5), Kruppel-H (4+), Odd-skipped (4), 
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Odd-paired (4), Pep (3), Snail (5), Spalt-major (7), Serependity locus beta 
(6), delta (7), h-1 (8), Suppressor of hairy wing su(Hw) (12), Suppressor 
of variegation suvar(3)7 (5), Teashirt (3) and Tramtrack (2). 

- Xenopus: transcription factor TFIIIA (9), p43 from RNP particle (9), Xfin 
5 (37!!), Xsna (5), gastmla XlcGF5 . 1 to XlcGF7 1 . 1 (from 4+ to 1 1 +), Oocyte 
XlcOF2 to XlcOF22 (from 7 to 12). 

-Mammalian: basonuclin (6), BCL-6/LAZ-3 (6), erythroid krueppel-like 
transcription factor (3), transcription factors Spl (3), Sp2 (3), Sp3 (3) 
and Sp(4) 3, transcriptional repressor YY1 (4), Wilms' tumor protein (4), 
10 EGRl/Krox24 (3), EGR2/Krox20 (3), EGR3/Pilot (3), EGR4/AT133 (4), Evi-1 
(10), GLI1 (5), GLI2 (4+), GLI3 (3+), HIV-EP1/ZNF40 (4), HIV-EP2 (2), KR1 
(9+), KR2 (9), KR3 (15+), KR4 (14+), KR5 (11+), HF.12 (6+), REX-1 (4), ZfX 
(13),ZfY(13), Zfp-35(18), ZNF7 (15), ZNF8 (7), ZNF35 (10), ZNF42/MZF-1 
(13), ZNF43 (22), ZNF46/Kup (2), ZNF76 (7), ZNF91 (36), ZNF133 (3). 

15 

In addition to the conserved zinc ligand residues it has been shown [6] that a 
number of other positions are also important for the structural integrity of 
the C2H2 zinc fingers. The best conserved position is found four residues 
after the second cysteine; it is generally an aromatic or aliphatic residue. 

20 

-Consensus pattern: C-x(2,4)-C-x(3)- [LI¥MFYWC] [U^^ 
x(3,5)-H 

[The two C's and two H's are zinc ligands] 

25 [ 1] Klug A., Rhodes D. 

Trends Biochem. Sci. 12:464-469(1987). 
[ 2] Evans R.M., Hollenberg S.M. 

Cell 52:1-3(1988). 
[ 3] Payre F., Vincent A. 
30 FEBS Lett. 234:245-250(1988). 

[ 4] Miller J., McLachlan A.D., Klug A. 

EMBO J. 4:1609-1614(1985). 
[ 5] Berg J.M. 
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Proc. Natl. Acad. Sci. U.S.A. 85:99-102(1988). 
[ 6] Rosenfeld R., Margalit H. 
J. Biomol. Struct. Dyn. 11:557-570(1993). 

727. Zinc finger, C3HC4 type (RING finger) 

A number of eukaryotic and viral proteins contain a conserved cysteine-rich 
domain of 40 to 60 residues (called C3HC4 zinc-finger or 'RING' finger) [1] 
that binds two atoms of zinc, and is probably involved in mediating protein- 
protein interactions. The 3D structure of the zinc ligation system is unique 
to the RING domain and is refered to as the "cross-brace" motif. The spacing 
of the cysteines in such a domain is C-x(2)-C-x(9 to 39)-C-x(l to 3)-H-x(2 to 
3)-C-x(2)-C-x(4 to 48)-C-x(2>C. 

Proteins currently known to include the C3HC4 domain are listed below 
(references are only provided for recently determined sequences). 

-Mammalian V(D)J recombination activating protein (gene RAG1). RAG1 
activates the rearrangement of immunoglobulin and T-cell receptor genes. 

- Mouse rpt-1. Rpt-1 is a trans-acting factor that regulates gene expression 
directed by the promoter region of the interleukin-2 receptor alpha chain 
or the LTR promoter region of HIV-1 . 

- Human rfp. Rfp is a developmental^ regulated protein that may function in 
male germ cell development. Recombination of the N-terminal section of rfp 
with a protein tyrosine kinase produces the ret transforming protein. 

- Human 52 Kd Ro/SS-A protein. A protein of unknown function from the Ro/SS- 
ribonucleoprotein complex. Sera from patients with systemic lupus 
erythematosus or primary Sjogren's syndrome often contain antibodies that 
react with the Ro proteins. 

- Human histocompatibility locus protein RING1. 

- Human PML, a probable transcription factor. Chromosomal translocation of 
PML with retinoic receptor alpha creates a fusion protein which is the 
cause of acute promyelocyte leukemia (APL). 
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- Mammalian breast cancer type 1 susceptibility protein (BRCA1) [El]. 

- Mammalian cbl proto-oncogene. 

- Mammalian bmi-1 proto-oncogene. 

- Vertebrate CDK-activating kinase (CAK) assembly factor MAT1, a protein that 
stabilizes the complex between the CDK7 kinase and cyclin H (MAT1 stands 
for 'Menage A Trois'). 

-Mammalian mel-18 protein. Mel- 18 which is expressed in a variety of tumor 
cells is a transcriptional repressor that recognizes and bind a specific 
DNA sequence. 

- Mammalian peroxisome assembly factor- 1 (PAF-1) (PMP35), which is somewhat 
involved in the biogenesis of peroxisomes. In humans, defects in PAF-1 are 
responsible for a form of Zellweger syndrome, an autosomal recessive 
disorder associated with peroxisomal deficiencies. 

- Human MAT1 protein, which interacts with the CDK7-cyclin H complex. 

- Human RING1 protein. 

- Xenopus XNF7 protein, a probable transcription factor. 

- Trypanosoma protein ESAG-8 (T-LR), which may be involved in the 
postranscriptional regulation of genes in VSG expression sites or may 
interact with adenylate cyclase to regulate its activity. 

- Drosophila proteins Posterior Sex Combs (Psc) and Suppressor two of zeste 
(Su(z)2). The two proteins belong to the Polycomb group of genes needed to 
maintain the segment-specific repression of homeotic selector genes. 

- Drosophila protein male-specific msl-2, a DNA-binding protein which is 
involved in X chromosome dosage compensation (the elevation of 
transcription of the male single X chromosome). 

- Arabidopsis thaliana protein COP 1 which is involved in the regulation of 
photomorphogenesis. 

- Fungal DNA repair proteins RAD5, RAD 16, RAD18 and rad8. 
Herpesviruses trans-acting transcriptional protein ICP0/IE1 10. This protein 
which has been characterized in many different herpesviruses is a trans- 
activator and/or -repressor of the expression of many viral and cellular 
promoters. 

Baculoviruses protein CG30. 
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- Baculoviruses major immediate early protein (PE-38). 

- Baculoviruses immediate-early regulatory protein IE-N/IE-2. 

- Caenorhabditis elegans hypothetical proteins F54G8.4, R05D3.4 and T02C1.1. 

- Yeast hypothetical proteins YER1 16c and YKR017c. 

5 

The central region of the domain was selected as a signature pattern 
for the C3HC4 finger. 

-Consensus pattern: C-x-H-x-{i^MR^ 
10 fePm¥-A|[I/l VM Y A SEC) ID NO:609)[ 

[ 1] Borden K.L.B., Freemont P.S. 
Curr. Opin. Struct Biol. 6:395-401(1996). 

15 

728. Zinc finger C-x8-C-x5-C-x3-H type (and similar). 

729. Zinc finger, CCHC class 

2 0 A family of CCHC zinc fingers, mostly from retroviral gag 
proteins (nucleocapsid). Prototype structure is from HIV. 
Also contains members involved in eukaryotic gene 
regulation, such as C. elegans GLH-1. 

Structure is an 18-residue zinc finger; no examples of indels 
2 5 in the alignment. 

730. Zn-finger in Ran binding protein and others. 



30 

731. AN 1 -like Zinc finger 



Reference No. 2750-942P 

602 

Zinc finger at the C-terminus of Anl Swiss: 09 1889 , a ubiquitin-like protein in Xenopus 
laevis. The following pattern describes the zinc finger. C-X2-C-X(9-12)-C-X(l-2)-C-X4-C- 
X2-H-X5-H-X-C Where X can be any amino acid, and numbers in brackets indicate the 
number of residues. 

5 

[1] Linnen JM, Bailey CP, Weeks DL; Gene 1993;128:181-188. 
732. 14-3-3 proteins 

10 Structure of a 14-3-3 protein and implications for coordination of multiple 
signalling pathways. 

Xiao B, Smerdon SJ, Jones DH, Dodson GG, Soneji Y, Aitken A, Gamblin SJ; 
Nature 1995;376:188-191. 

Crystal structure of the zeta isoform of the 14-3-3 protein. 
1 5 Liu D, Bienkowska J, Petosa C, Collier RJ, Fu H, Liddington R; 
Nature 1995;376:191-194. 

Interaction of 14-3-3 with signaling proteins is mediated by the 
recognition of phosphoserine. 
2 0 Muslin AJ, Tanner JW, Allen PM, Shaw AS; 
Cell 1996;84:889-897. 

The 14-3-3 protein binds its target proteins with a common site 
located towards the C-terminus. 
2 5 Ichimura T, Ito M, Itagaki C, Takahashi M, Horigome T, Omata S, Ohno S, 
Isobe T 

FEBS Lett 1997;413:273-276. 

Molecular evolution of the 14-3-3 protein family. 
30 Wang W, Shakes DC 

J Mol Evol 1996;43:384-398. 

Function of 14-3-3 proteins. 

Jin DY, Lyu MS, Kozak CA, Jeang KT 
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Nature 1996;382:308-308. 

The 14-3-3 proteins [1,2,3] are a family of closely related acidic homodimeric 
proteins of about 30 Kd which were first identified as being very abundant in 
5 mammalian brain tissues and located preferentially in neurons. The 14-3-3 
proteins seem to have multiple biological activities and play a key role in 
signal transduction pathways and the cell cycle. They interacts with kinases 
such as PKC or Raf-1 ; they seem to also function as protein-kinase dependent 
activators of tyrosine and tryptophan hydroxylases and in plants they are 
1 0 associated with a complex that binds to the G-box promoter elements. 

The 14-3-3 family of proteins are ubiquitously found in all eukaryotic species 
studied and have been sequenced in fungi (yeast BMH1 and BMH2, fission yeast 
rad24 and rad25), plants, Drosophila, and vertebrates. The sequences of the 
15 14-3-3 proteins are extremely well conserved. Two highly conserved regions have 
been selected as signature patterns: the first is a peptide of 1 1 residues 
located in the N-terminal section; the second, a 20 amino acid region located 
in the C-terminal section. 

2 0 -Consensus pattern: R-N-L-[LIV]-S-[VG]-[GA]- Y-[KN]-N-[IVA] 

-Consensus pattern: Y-K-[DE]-S-T-L-I-[IM]-Q-L-[LF]-[RHC]-D-N-[LF]-T-[LS]-W- 
[TAN]-[SAD] 

[ 1] Aitken A. 
2 5 Trends Biochem. Sci. 20:95-97(1995). 
[ 2] Morrison D. 

Science 266:56-57(1994). 
[ 3] Xiao B., Smerdon S.J., Jones D.H., Dodson G.G., Soneji Y., Aitken A., 
Gamblin S.J. 
30 Nature 376:188-191(1995). 

733. D-isomer specific 2-hydroxyacid dehydrogenases (2 Hacid DH) 
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This Pfam covers the Formate dehydrogenase, D-g]ycerate dehydrogenase and D-lactate 
dehydrogenase families in SCOP. A number of NAD-dependent 2-hydroxyacid dehydrogenases which 
seem to be specific for the D-isomer of their substrate have been shown [1,2,3,4] to be functionally and 
structurally related. These enzymes are listed below. 

- D-lactate dehydrogenase (EC 1.1.1 .28), a bacterial enzyme which catalyzes the 
reduction of D-lactate to pyruvate. 

- D-glycerate dehydrogenase (EC 1.1.1 .29) (NADH-dependent hydroxypyruvate 
reductase), a plant leaf peroxisomal enzyme that catalyzes the reduction of 
hydroxypyruvate to glycerate. This reaction is part of the glycolate pathway of 
photorespiration. 

D-glycerate dehydrogenase from the bacteria Hyphomicrobium methylovorum 
and Methylobacterium extorquens. 

3-phosphoglycerate dehydrogenase (EC 1.1.1 .95), a bacterial enzyme that 
catalyzes the oxidation of D-3-phosphoglycerate to 3 -phosphohydroxy pyruvate. 
This reaction is the first committed step in the 'phosphorylated' pathway of serine 
biosynthesis. 

Erythronate-4-phosphate dehydrogenase (EC 1.1.1.-) (gene pdxB), a bacterial 
enzyme involved in the biosynthesis of pyridoxine (vitamin B6). 
D-2-hydroxyisocaproate dehydrogenase (EC 1.1.1.-) (D-hicDH), a bacterial 
enzyme that catalyzes the reversible and stereospecific interconversion between 2- 
ketocarboxylic acids and D-2-hydroxy-carboxylic acids. 

- Formate dehydrogenase (EC 1 .2. 1 .2) (FDH) from the bacteria Pseudomonas sp. 
101 and various fungi [5]. 

Vancomycin resistance protein vanH from Enterococcus faecium; this protein is a 
D-specific alpha-keto acid dehydrogenase involved in the formation of a 
peptidoglycan which does not terminate by D-alanine thus preventing 
vancomycin binding. 

Escherichia coli hypothetical protein ycdW. 
Escherichia coli hypothetical protein yiaE. 
Haemophilus influenzae hypothetical protein HI 1556. 
Yeast hypothetical protein YER08 1 w. 
Yeast hypothetical protein YIL074w. 
All these enzymes have similar enzymatic activities and are structurally related. Three 
of the most conserved regions of these proteins have been selected to develop patterns. The 
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first pattern is based on a glycine-rich region located in the central section of these enzymes; 
this region probably corresponds to the NAD-binding domain. The two other patterns contain 
a number of conserved charged residues, some of which may play a role in the catalytic 
mechanism. 



-Consensus pattern: fWMAffUVMA SEP F. P NO :3 () )]-[AG]-[IVT]-IW-!Vff¥il LIVMF Y 
SEP fP NO:lSn-rAGl-x-G-R4tfj^9GgAG4rNHKRQOSAC SEP IP NO:653)HLIV]-G- 

X ( 1 3 , 1 4)-MVfMTii;u vfMi ...SEQ.ID MOMm-^-^ : ^ : mm^£i)±mQ.m. 



1 0 -Consensus pattern: j'f. JVN4FY WA j f I J V MF Y W A SEP IP NO:4i )| - 

NO:658)]-fl^A]rrWASEQIDN 

SEQJDML^^ 
I€SDN4jGSDN SEP IP NO:66 0,V| 
15 -Consensus pattern: jfeMFA^ fLMFATC SEP IP NQ:66nH KPQ1-x-tWft>NjfGSTPN 

I^^F :i A¥4{LIVJMFYW 

fGPl-x-fWW rLJVH SEP IP NP:663) j -fWM€4 fI,jVMC SEP ID NO:i42Y| -[DNV] 

20 [1] Grant G.A. Biochem. Biophys. Res. Commun. 165:1371-1374(1989). 

[2] Kochhar S., Hunziker P., Leong-Morgenthaler P.M., Hottinger H. Biochem. Biophys. 
Res. Commun. 184:60-66(1992). 

[3] Ohta T., Taguchi H. J. Biol. Chem. 266:12588-12594(1991). 
[4] Goldberg J.D., Yoshida T., Brick P. J. Mol. Biol. 236:1 123-1 140(1994). 
25 [5] Popov V.O., Lamzin V.S. Biochem. J. 301:625-643(1994). 

734. 2-oxo acid dehydrogenases acyltransferase (catalytic domain) 
Refined crystal structure of the catalytic domain of dihysrolipoyl 
30 transacetylase (E2P) from azotobacter vineelandii at 2.6 angstroms 
resolution. 

Mattevi A, Obmolova G, Kalk KH, Westphal AH, De Kok A, Hoi WG; 
J Mol Biol 1993 ;230:1 183-1 199. 



5 
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These proteins contain one to three copies of a lipoyl binding domain 
followed by the catalytic domain. 

5 735. 3 -beta hydroxy steriod dehydrogenase/isomerase family 

Structure and tissue-specific expression of 3 

beta-hydroxy steroid dehydrogenase/5 -ene-4-ene isomerase 

genes in human and rat classical and peripheral 

steroidogenic tissues. 
10 Labrie F, Simard J, Luu-The V, Pelletier G, Belanger A, 

Lachance Y, Zhao HF, Labrie C, Breton N, de Launoit Y, et al 

J Steroid Biochem Mol Biol 1992;41:421-435. 

The enzyme 3 beta-hydroxy steroid dehydrogenase/5 -ene-4-ene 

isomerase (3 beta-HSD) catalyzes the oxidation and isomerization 
15 of 5-ene-3 beta-hydroxypregnene and 5-ene-hydroxyandrostene 

steroid precursors into the corresponding 4-ene-ketosteroids necessary 

for the formation of all classes of steroid hormones. 

2 0 736. 3-hydroxyacyl-CoA dehydrogenase 

This family also includes lambda crystallin. 

Structure of L- 3 -hydroxyacyl -coenzyme A dehydrogenase: 

preliminary chain tracing at 2.8-A resolution. 

Birktoft JJ, Holden HM, Hamlin R, Xuong NH, Banaszak LJ; 
25 Proc Natl Acad Sci USA 1987;84:8262-8266. 

3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35) (HCDH) [1] is an enzyme involved 
in fatty acid metabolism, it catalyzes the reduction of 3-hydroxyacyl-CoA to 
3-oxoacyl-CoA. Most eukaryotic cells have 2 fatty-acid beta-oxidation systems, 
30 one located in mitochondria and the other in peroxisomes. In peroxisomes 

3-hydroxyacyl-CoA dehydrogenase forms, with enoyl-CoA hydratase (ECH) and 
3,2-trans-enoyl-CoA isomerase (ECI) a multifunctional enzyme where the N- 
terminal domain bears the hydratase/isomerase activities and the C-terminal 



Reference No. 2750-942P 



607 

domain the dehydrogenase activity. There are two mitochondrial enzymes: one 
which is monofunctional and the other which is, like its peroxisomal 
counterpart, multifunctional. 

5 In Escherichia coli (gene fadB) and Pseudomonas fragi (gene faoA) HCDH is part 
of a multifunctional enzyme which also contains an ECH/ECI domain as well as a 
3-hydroxybutyryl-CoA epimerase domain [2]. 

The other proteins structurally related to HCDH are: 

10 

- Bacterial 3-hydroxybutyryl-CoA dehydrogenase (EC 1.1.1.157) which reduces 

3-hydroxybutanoyl-CoA to acetoacetyl-CoA [3]. 
-Eye lens protein lambda-crystallin [4] , which is specific to lagomorphes 

(such as rabbit). 

15 

There are two major region of similarities in the sequences of proteins of the 
HCDH family, the first one located in the N-terminal, corresponds to the NAD- 
binding site, the second one is located in the center of the sequence. A signature 
pattern has been derived from this central region. 

20 

-Consensus pattern: |PNE1-x(2)-rGA1-F4LI VMF Y]-[LJVM.FY SEP ID N O: 1 8)]-x-[NT]-R- 
x(3)-rPA1^I^MF^jLIV MFY S EP ID NO:1 8 )](2>- 

x(5)-fkWM*^ 
NQ;18Jl-x(2)-[GV] 

25 

[ 1] Birktoff J.J., Holden H.M., Hamlin R. ? Xuong N.-H., Banaszak L.J. 

Proc. Natl. Acad. Sci. U.S.A. 84:8262-8266(1987). 
[ 2] Nakahigashi K., Inokuchi H. 

Nucleic Acids Res. 18:4937-4937(1990). 
30 [3] Mullany P., Clayton C.L., Pallen M.J., Slone R. ? Al-Saleh A. ? 

Tabaqchali S. 

FEMS Microbiol. Lett. 124:61-67(1994). 
[ 4] Mulders J.W.M., Hendriks W., Blankesteijn W.M., Bloemendal H. ? 
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de Jong W.W. 

J. Biol. Chem. 263:15462-15466(1988). 



5 737. 60s Acidic ribosomal protein 

Proteins PI, P2, and P0, components of the eukaryotic 
ribosome stalk. New structural and functional aspects. 
Remacha M, Jimenez-Diaz A, Santos C, Briones E, Zambrano R, 
Rodriguez Gabriel MA, Guarinos E, Ballesta JP; 
10 Biochem Cell Biol 1995;73:959-968. 

This family includes archaebacterial L12, eukaryotic P0, PI and P2. 



738. 6-phosphogluconate dehydrogenases 
1 5 6-phosphogluconate dehydrogenase (EC 1.1.1 .44) (6PGD) catalyzes the third step 
in the hexose monophosphate shunt, the decarboxylating reduction of 
6-phosphogluconate in to ribulose 5-phosphate. 

Prokaryotic and eukaryotic 6PGD are proteins of about 470 amino acids whose 
2 0 sequence are highly conserved [1]. A region which has been shown [2], from studies 

of the sheep 6PGD tertiary structure, to be involved in the binding of 6-phosphogluconate 
has been selected as a signature pattern. 

-Consensus pattern: {W¥M][LlVM..SEQID.NO:4)j-x-D-x(2)-[GA]-[NQS 

25 

[1] Reizer A., Deutscher J., Saier M.H. Jr., Reizer J. 

Mol. Microbiol. 5:1081-1089(1991). 
[ 2] Adams M.J., Archibald I.G., Bugg C.E., Came A., Gover S., 

Helliwell J.R., Pickersgill R.W., White S.W. 
30 EMBO J. 2:1009-1014(1983). 



739. (7tm 1) G-protein coupled receptors [1 to 4,E1,E2] (also called R7G) are an extensive 
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group of hormones, neurotransmitters, odorants and light receptors which 
transduce extracellular signals by interaction with guanine nucleotide- 
binding (G) proteins. The receptors that are currently known to belong to this 
family are listed below. 

5 

- 5-hydroxytryptamine (serotonin) 1A to IF, 2 A to 2C, 4, 5 A, 5B, 6 and 7 [5]. 

- Acetylcholine, muscarinic-type, Ml to M5. 

- Adenosine Al, A2A, A2B and A3 [6]. 

- Adrenergic alpha- 1 A to -1C; alpha-2A to -2D; beta-1 to -3 [7]. 
10 - Angiotensin II types I and II. 

- Bombesin subtypes 3 and 4. 

- Brady kinin Bl and B2. 

- c3a and C5a anaphylatoxin. 

- Cannabinoid CB1 and CB2. 

1 5 - Chemokines C-C CC-CKR-1 to CC-CKR-8. 

- Chemokines C-X-C CXC-CKR-1 to CXC-CKR-4. 

- Cholecystokinin-A and cholecystokinin-B/gastrin. 

- Dopamine Dl to D5 [8]. 

- Endothelin ET-a and ET-b [9]. 

2 0 - fMet-Leu-Phe (fMLP) (N-formyl peptide). 

- Follicle stimulating hormone (FSH-R) [10]. 

- Galanin. 

- Gastrin-releasing peptide (GRP-R). 

- Gonadotropin-releasing hormone (GNRH-R). 

2 5 - Histamine HI and H2 (gastric receptor I). 

- Lutropin-choriogonadotropic hormone (LSH-R) [10]. 

- Melanocortin MC1R to MC5R. 

- Melatonin. 

- Neuromedin B (NMB-R). 

3 0 - Neuromedin K (NK-3R). 

- Neuropeptide Y types 1 to 6. 

- Neurotensin (NT-R). 

- Octopamine (tyramine), from insects. 
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-Odorants [11]. 

- Opioids delta-, kappa- and mu-types [12], 

- Oxytocin (OT-R). 

- Platelet activating factor (PAF-R). 
5 - Prostacyclin. 

- Prostaglandin D2. 

- Prostaglandin E2, EP1 to EP4 subtypes. 

- Prostaglandin F2. 

- Purinoreceptors (ATP) [13]. 
10 - Somatostatin types 1 to 5. 

- Substance-K (NK-2R). 
-Substance-P (NK-1R). 

- Thrombin. 

- Thromboxane A2. 

1 5 - Thyrotropin (TSH-R) [10]. 

- Thyrotropin releasing factor (TRH-R). 

- Vasopressin Via, VI b and V2. 

- Visual pigments (opsins and rhodopsin) [14]. 

- Proto-oncogene mas. 

2 0 - A number of orphan receptors (whose ligand is not known) from mammals and 
birds. 

- Caenorhabditis elegans putative receptors C06G4.5, C38C10.1, C43C3.2, 
T27D1.3 and ZC84.4. 

- Three putative receptors encoded in the genome of cytomegalovirus: US27, 
25 US28, andUL33. 

- ECRF3, a putative receptor encoded in the genome of herpesvirus saimiri. 

The structure of all these receptors is thought to be identical. They have 
seven hydrophobic regions, each of which most probably spans the membrane. 
30 The N-terminus is located on the extracellular side of the membrane and is 
often glycosylated, while the C-terminus is cytoplasmic and generally 
phosphorylated. Three extracellular loops alternate with three intracellular 
loops to link the seven transmembrane regions. Most, but not all of these 
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receptors, lack a signal peptide. The most conserved parts of these proteins 
are the transmembrane regions and the first two cytoplasmic loops. A conserved 
acidic-Arg-aromatic triplet is present in the N-terminal extremity of the 
second cytoplasmic loop [15] and could be implicated in the interaction with G 
5 proteins. 

To detect this widespread family of proteins, a pattern that contains the conserved 
triplet and that also spans the major part of the third transmembrane helix has 
been developed. 

10 

-Consensus pattern: fGS'ft\i*VMF^^ 

m;M6)^x(2 HLr¥MNQGA] [U¥MNQ^ 

1 5 p,[yMF¥WSTAG}i;LIVMFYWSTAC SEP ID NO:660i]-fP£Ntt]i PENH SEP ID 
Nfi:670)I-R-ff^WGS« 

[ 1] Strosberg A.D. 
20 Eur. J. Biochem. 196:1-10(1991). 
[ 2] Kerlavage A.R. 

Curr. Opin. Struct. Biol. 1:394-401(1991). 
[ 3] Probst W.C., Snyder L.A., Schuster D.I., Brosius J., Sealfon S.C. 

DNA Cell Biol. 11:1-20(1992). 
25 [4] Savarese T.M., Fraser CM. 

Biochem. J. 283:1-9(1992). 
[ 5] Branchek T. 

Curr. Biol. 3:315-317(1993). 
[ 6] Stiles G.L. 
30 J. Biol. Chem. 267:6451-6454(1992). 

[ 7] Friell T., Kobilka B.K., Lefkowitz R.J., Caron M.G. 

Trends Neurosci. 1 1 :321 -324(1 988). 
[ 8] Stevens C.F. 
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Curr. Biol. 1:20-22(1991). 
[ 9] Sakurai T., Yanagisawa M., Masaki T. 

Trends Pharmacol. Sci. 13:103-107(1992). 
[10] Salesse R., Remy J.J., Levin J.M., Jallal B., Gamier J. 
5 Biochimie 73:109-120(1991). 
[11] Lancet D., Ben-Arie N. 

Curr. Biol. 3:668-674(1993). 
[12] Uhl G.R., Childers S., Pasternak G. 

Trends Neurosci. 17:89-93(1994). 
10 [13] Barnard E.A., Burnstock G., Webb T.E. 

Trends Pharmacol. Sci. 15:67-70(1994). 
[14] Applebury M.L., Hargrave P. A. 

Vision Res. 26:1881-1895(1986). 
[15] Attwood T.K., Eliopoulos E.E., Findlay J.B.C. 
15 Gene 98:153-159(1991). 

(7tm 1) Visual pigments (opsins) retinal binding site 

Visual pigments [1,2] are the light-absorbing molecules that mediate vision. 
They consist of an apoprotein, opsin, covalently linked to the chromophore 
2 0 cis-retinal. Vision is effected through the absorption of a photon by cis- 
retinal which is isomerized to trans-retinal. This isomerization leads to a 
change of conformation of the protein. Opsins are integral membrane proteins 
with seven transmembrane regions that belong to family 1 of G-protein coupled 
receptors. 

25 

In vertebrates four different pigments are generally found. Rod cells, which 
mediate vision in dim light, contain the pigment rhodopsin. Cone cells, which 
function in bright light, are responsible for color vision and contain three 
or more color pigments (for example, in mammals: red, blue and green). 

30 

In Drosophila, the eye is composed of 800 facets or ommatidia. Each 
ommatidium contains eight photoreceptor cells (R1-R8): the Rl to R6 cells are 
outer cells, R7 and R8 inner cells. Each of the three types of cells (R1-R6, 
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R7 and R8) expresses a specific opsin. 

Proteins evolutionary related to opsins include squid retinochrome, also known 
as retinal photoisomerase, which converts various isomers of retinal into 11- 
5 cis retinal and mammalian retinal pigment epithelium (RPE) RGR [3], a protein 
that may also act in retinal isomerization. 

The attachment site for retinal in the above proteins is a conserved lysine 
residue in the middle of the seventh transmembrane helix. The pattern 
1 0 that had been developed includes this residue. 

-Consensus pattern: fLWMWA€]{U 

[STALIMR] [ST A OMR SEP I D NO:673)]-[GS A.CP>J V3[GSACPNV SEP ID NO:674)]- 
^:pA€fij[STACP SEP ID NO:384)j- 
15 x(2)- P^E-NF] I D EN P SEP ID NO :67 S)|-[AP]-x(2)-[IY] 
[K is the retinal binding site] 

[1] Applebury M.L., Hargrave P. A. 
Vision Res. 26:1881-1895(1986). 
20 [2] Fryxell K.J., Meyerowitz E.M. 
J. Mol. Evol. 33:367-378(1991). 
[ 3] Shen D., Jiang M. ? Hao W., Tao L. ? Salazar M. ? Fong H.K.W. 
Biochemistry 33:13117-13125(1 994). 

2 5 The following descriptions of protein family functions are not provided by the Pfam or 

Prosite databases. 

740. BAH 

3 0 BAH domain. Number of members: 65 
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[1] Medline: 97074677. Molecular cloning of polybromo, a nuclear protein containing 
multiple domains including five bromodomains, a truncated HMG-box, and two repeats of a 
novel domain. Nicolas RH, Goodwin GH; Gene 1996;175:233-240. 
[2] Medline: 99198739. The BAH (bromo-adjacent homology) domain: a link between 
5 DNA methylation, replication and transcriptional regulation. Callebaut I, Courvalin J-C, 
Mornon JP; FEBS letts 1999;446:189-193. 

741. ELM2. 

10 ELM2 domain. The ELM2 (Egl-27 and MTA1 homology 2) domain is a small domain of 
unknown function. Number of members: 10 

742. Euk proin. EUKARYOTIC_PORIN The major protein of the outer mitochondrial 
1 5 membrane of eukaryotes is a porin that forms a voltage-dependent anion-selective 

channel (VDAC) that behaves as a general diffusion pore for small hydrophilic molecules [1 
to 4]. The channel adopts an open conformation at low or zero membrane potential and a 
closed conformation at potentials above 30-40 mV. 

This protein contains about 280 amino acids and its sequence is composed of between 12 
2 0 to 16 beta-strands that span the mitochondrial outer membrane. Yeast contains two 

members of this family (genes POR1 and POR2); vertebrates have at least three members 
(genes VDAC1, VDAC2 and VDAC3) [5]. 

A conserved region located at the C-terminal part of these proteins was selected as a 
signature pattern. 

25 

Consensus pattemrYH1-x(2)-D-fSPeAm i SPCAD SEP ID NQ:676i] -x-[STA]-xr3V[TAG]- 

[KRH«¥MFHU^ 

[TJVMYi riJVMY SEP ID NO: 141)] 

30 

[ 1] Benz R. Biochim. Biophys. Acta 1 197:167-196(1994). 
[ 2] Manella C.A. Trends Biochem. Sci. 17:315-320(1992). 
[ 3] Dihanich M. Experientia 46:146-153(1990). 
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[ 4] Forte M, Guy H.R., Mannella C.A. J. Bioenerg. Biomembr. 19:341-350(1987). 

[ 5] Sampson M.J., Lovell R.S., Davison D.B., Craigen W.J. Genomics 36:192-196(1996). 

5 743. Glycohydor 19 

Chitinases family 19 signatures 

cross-reference(s) CHITINASE_19_1, CHITINASE_19_2 

Chitinases (EC 3.2.1.14) [1] are enzymes that catalyze the hydrolysis of the beta-l,4-N- 
acetyl-D-glucosamine linkages in chitin polymers. From the view point of sequence 

10 similarity chitinases belong to either family 18 or 19 in the classification of glycosyl 

hydrolases [2,E1]. Chitinases of family 1 9 (also known as classes I A or I and IB or II) 
are enzymes from plants that function in the defense against fungal and insect pathogens 
by destroying their chitin-containing cell wall. Class IA/I and IB/II enzymes differ in the 
presence (IA/I) or absence (IB/II) of a N-terminal chitin-binding domain (see the relevant 

1 5 entry <PDOC00025>). The catalytic domain of these enzymes consist of about 220 to 230 
amino acid residues. 

Two highly conserved regions were selected as signature patterns, the first one is located in 
the N-terminal section and contains one of the six cysteines which are conserved in most, 
if not all, of these chitinases and which is probably involved in a disulfide bond. 

20 

Consensus pattemC-x(4,5)-F-Y4STVx(3)4FYVfWMRr iiVMF SEP ID NO:2 )]-x-A-x(3)- 
[YF]-x(2)-F-[GSA] 

Consensus patternft:vl^Mi;LIV M SE Q..1D NO:4)14GSAl-F-x-(^TAG|l S TAG SEP ID 
2 5 NO: 4)1 

[ l]Flach J., Pilet P.-E. ? Jolles P. Experientia 48:701-716(1992). 
[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

30 

744. MBD 

Methyl-CpG binding domain 
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The Methyl-CpG binding domain (MBD) binds to DNA that contains one or more 
symmetrically methylated CpGs [1], DNA methylation in animals is associated with 
alterations in chromatin structure and silencing of gene expression. MBD has negligible non- 
specific affinity for DNA. In vitro foot-printing with MeCP2 showed the MBD can protect a 
5 12 nucleotide region surrounding a methyl CpG pair [1]. MBDs are found in several Methyl- 
CpG binding proteins and also DNA demethylase [2]. Number of members: 1 1 

[l]Medline: 94232813. Dissection of the methyl-CpG binding domain from the chromosomal 
protein MeCP2. Nan X, Meehan RR, Bird A; Nucleic Acids Res 1993;21:4886-4892. 
10 [2]Medline: 99158138. A mammalian protein with specific demethylase activity for mCpG 
DNA. Bhattacharya SK, Ramchandani S, Cervoni N, Szyf M; Nature 1999;397:579-583. 

745. Peptidase CI 
1 5 Eukaryotic thiol (cysteine) proteases active sites 

cross-reference(s) THIOL J>ROTEASE_CYS; THIOL_PROTEASE_HIS; 
THIOL_PROTEASE_ASN 

Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes which 
contain an active site cysteine. Catalysis proceeds through a thioester intermediate and is 

2 0 facilitated by a nearby histidine side chain; an asparagine completes the essential catalytic 

triad. The proteases which are currently known to belong to this family are listed below 
(references are only provided for recently determined sequences). 

-Vertebrate lysosomal cathepsins B (EC 3.4.22.1), H (EC 3.4.22.16), L (EC 3.4.22.15), 
and S (EC 3.4.22.27) [2]. 
25 - Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathepsin C) 
[2]. 

- Vertebrate calpains (EC 3.4.22.17). Calpains are intracellular calcium- activated thiol 
protease that contain both a N-terminal catalytic domain and a C-terminal calcium-binding 
domain. 

3 0 - Mammalian cathepsin K ? which seems involved in osteoclastic bone resorption [3]. 

- Human cathepsin O [4]. 

- Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the antitumor drug 
BLM (a glycopeptide). 
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- Plant enzymes: barley aleurain (EC 3.4.22.16), EP-B1/B4; kidney bean EP-C1, rice bean 
SH-EP; kiwi fruit actinidin (EC 3.4.22.14); papaya latex papain (EC 3.4.22.2), 
chymopapain (EC 3.4.22.6), caricain (EC 3.4.22.30), and proteinase IV (EC 3.4.22.25); 
pea turgor-responsive protein 15 A; pineapple stem bromelain (EC 3.4.22.32); rape COT44; 

5 rice oryzain alpha, beta, and gamma; tomato low-temperature induced, Arabidopsis 
thaliana A494, RD 1 9 A and RD2 1 A. 

- House-dust mites allergens DerPl and EurMl. 

- Cathepsin B-like proteinases from the worms Caenorhabditis elegans (genes gcp-l,cpr- 
3, cpr-4, cpr-5 and cpr-6), Schistosoma mansoni (antigen SM31) andJaponica (antigen 

10 SJ31), Haemonchus contortus (genes AC-1 and AC-2), and Ostertagia ostertagi (CP-1 and 
CP-3). 

- Slime mold cysteine proteinases CP1 and CP2. 

- Cruzipain from Trypanosoma cruzi and brucei. 

- Throphozoite cysteine proteinase (TCP) from various Plasmodium species. 
15 - Proteases from Leishmania mexicana, Theileria annulata and Theileria parva. 

- Baculoviruses cathepsin-like enzyme (v-cath). 

- Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a 
calpain-like domain. 

- Yeast thiol protease BLH1/YCP1/LAP3. 

20 -Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. 

Two bacterial peptidases are also part of this family: 

- Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 
2 5 - Thiol protease tpr from Porphyromonas gingivalis. 

Three other proteins are structurally related to this family, but may have lost their 
proteolytic activity. 

30 -Soybean oil body protein P34. This protein has its active site cysteine replaced by a 
glycine. 
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- Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the 
active site cysteine is replaced by a serine. Rat testin should not be confused with mouse 
testin which is a LIM-domain protein (see <PDOC00382>). 

- Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. 

5 This protein of 1 1 1 Kd possesses a C-terminal thiol-protease-like domain [6], but the active 
site cysteine is replaced by a serine. 

The sequences around the three active site residues are well conserved and can be used as 
signature patterns. 

1 0 Consensus P attemQ-x(3)-[GE]-x-C-[W]-xaV^ SEP ID NO:45)l- 

f y£AQG¥] [ ST AGC V SEP ID NO: 159)1 [C is 
the active site residue] 

Note the residue in position 4 of the pattern is almost always cysteine; the only exceptions are 
calpains (Leu), bleomycin hydrolase (Ser) and yeast YCP1 (Ser). Note the residue in position 
15 5 of the pattern is always Gly except in papaya protease IV where it is Glu. 
Consensus pattem{MVMGS : i^ 

SEQ.1X1NO 

NO: 1 62)1 (2 )-G-x- |'G S A PN H. 1|XtS ADNH SEQ ID NO: 163)1 [H is the active site residue] 
Consensus pattem[IAhGM-}[FYOT 
20 x-P4K^AG}[0 
SEQJDNGM 

N O: 2) ] [N is the active site residue] 

Note these proteins belong to family CI (papain-type) and C2 (calpains) in the classification 
of peptidases [7,E1]. 

25 

[ l]Dufour E. Biochimie 70:1335-1342(1988). 

[ 2]Kirschke H., Barrett A.J., Rawlings N.D. Protein Prof 2:1587-1643(1995). 

[ 3]Shi G.-P., Chapman H.A., Bhairi S.M., Deleeuw C, Reddy V.Y., Weiss S.J. FEBS Lett. 

357:129-134(1995). 

30 [ 4]Velasco G. ? Ferrando A.A., Puente X.S., Sanchez L.M., Lopez-Otin C. J. Biol. Chem. 
269:27136-27142(1994). 

[ 5]Chapot-Chartier M.P., Nardi M. ? Chopin M.C., Chopin A., Gripon J.C. Appl. Environ. 
Microbiol. 59:330-333(1993). 
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[ 6]Higgins D.G., McConnell DJ., Sharp P.M. Nature 340:604-604(1989). 
[ 7]Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

5 746. Peptidase M22 

Glycoprotease family signature cross-reference(s) GLYCOPROTEASE 
Glycoprotease (GCP) (EC 3.4.24.57) [1], or o-syaloglycoprotein endopeptidase, 
is a metalloprotease secreted by Pasteurella haemolytica which specifically 
cleaves O-sialoglycoproteins such as glycophorin A. The sequence of GCP is 
1 0 highly similar to the following uncharacterized proteins: 

- Escherichia coli hypothetical protein ygjD (ORF-X). 

- Bacillus subtilis hypothetical protein ydiE. 

- Mycobacterium leprae hypothetical protein U229E. 

15 - Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

- Synechocystis strain PCC 6803 hypothetical protein slr0807. 

- Methanococcus jannaschii hypothetical protein MJ1 130. 

- Haloarcula marismortui hypothetical protein in HSH 3 'region. 

- Yeast hypothetical protein YKR038c. 
2 0 - Yeast hypothetical protein QRI7. 

One of the conserved regions contains two conserved histidines. It is possible 
that this region is involved in coordinating a metal ion such as zinc. 

2 5 Consensus pattem[KR]-EQSA^[GSAT SEP ID NQ:100)]-x(4)-|¥¥WM[FYWL}l SEP ID 

NO:273)1 -f^0NGK}[ DQNGK SEP ID NQ:2?4}] -x-P-x-|«^Mm [LIVMFY SEP rD 

NO;iM-x(3)-H- 

x(2MAG]-H-[WVM}UiYM 

Note these proteins belong to family M22 in the classification of 
30 peptidases [2 5 E1]. 

[ l]Abdullah K.M., Lo R.Y.C., Mellors A. J. Bacteriol. 173:5597-5603(1991). 
[ 2]RawlingsN.D.> Barrett A.J. Meth. Enzymol. 248:183-228(1995). 
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747. SAM. SAM domain (Sterile alpha motif) 

It has been suggested that SAM is an evolutionarily conserved protein binding domain that is 
5 involved in the regulation of numerous developmental processes in diverse eukaryotes. The 
SAM domain can potentially function as a protein interaction module through its ability to 
homo- and heterooligomerise with other SAM domains. Number of members: 81 

[l]Medline: 96100659 SAM: A novel motif in yeast sterile alpha and Drosophila 

10 polyhomeotic proteins Ponting CP; Prot Sci 1995;4:1928-1930. 

[2]Medline: 97160498 SAM as a protein interaction domain involved in developmental 
regulation. Shultz J, Ponting CP, Hofinann K, Bork P; Prot Sci 1997;6:249-253. 
[3] Medline: 99101382 The crystal structure of an Eph receptor SAM domain reveals a 
mechanism for modular dimerization. Reference Author: Stapleton D, Balan I, Pawson 

15 T, Sicheri F; Nat Struct Biol 1 999;6:44-49. 

748. Tyrosinase signatures cross-reference(s) TYROSINASE^!; TYROSINASE_2 
Tyrosinase (EC 1.14.18.1) [1] is a copper monooxygenases that catalyzes the 

20 hydroxylation of monophenols and the oxidation of o-diphenols to o-quinols. 
This enzyme, found in prokaryotes as well as in eukaryotes, is involved in the 
formation of pigments such as melanins and other polyphenolic compounds. 

Tyrosinase binds two copper ions (CuA and CuB). Each of the two copper ion has 
2 5 been shown [2] to be bound by three conserved histidines residues. The regions 
around these copper-binding ligands are well conserved and also shared by some 
hemocyanins, which are copper-containing oxygen carriers from the hemolymph of 
many molluscs and arthropods [3,4]. 

30 At least two proteins related to tyrosinase are known to exist in mammals: 

- TRP-1 (TYRP1) [5], which is responsible for the conversion of 5,6-dihydro- 
xyindole-2-carboxylic acid (DHICA) to indole-5,6-quinone-2-carboxylic acid. 
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- TRP-2 (TYRP2) [6], which is the melanogenic enzyme DOPAchrome tautomerase 
(EC 5 .3 .3 . 1 2) that catalyzes the conversion of DOPAchrome to DHICA. TRP-2 
differs from tyrosinases and TRP-1 in that it binds two zinc ions instead 

of copper [7]. 

5 

Other proteins that belong to this family are: 

- Plants polyphenol oxidases (PPO) (EC 1.10.3.1) which catalyze the oxidation 
of mono- and o-diphenols to o-diquinones [8]. 

10 - Caenorhabditis elegans hypothetical protein C02C2. 1 . 

Two signature patterns for tyrosinase and related proteins have been derived 
The first one contains two of the histidines that bind CuA, and is located in 
the N-terminal section of tyrosinase. The second pattern contains a histidine 
1 5 that binds CuB 5 that pattern is located in the central section of the enzyme. 

Consensus pattern H-x(4 5 5)-F-[tP/MF^¥LIYMFl£ SE(2iI) NO:62S)]-x-[FW]-H-R-x(2)- 
[LM]-x(3)-E 

[The two H's are copper ligands] 
2 0 Consensus pattemD-P-x-F-{-«'VM^ [H is 

a copper 
ligand] 

[ l]Lerch K. Prog. Clin. Biol. Res. 256:85-98(1988). 

2 5 [ 2]Jackman MP., Hajnal A., Lerch K. Biochem. J. 274:707-713(1991). 

[ 3]Linzen B. Naturwissenschaften 76:206-21 1(1989). 

[ 4]Lang W.H., van Holde K.E. Proc. Natl. Acad. Sci. U.S.A. 88:244-248(1991). 

[ 5]Kobayashi T., Urabe K. ? Winder A., Jimenez-Cervantes C. 5 Imokawa G. ? Brewington T., 

Solano F. ? Garcia-Borron J.C., Hearing V.J. EMBO J. 13:5818-5825(1994). 

3 0 [ 6]Jackson I.J., Chambers D.M., Tsukamoto K. 5 Copeland N.G., Gilbert D.J. ? Jenkins N.A., 

Hearing V. EMBO J. 1 1 :527-535(1992). 

[ 7]Solano F., Martinez-Liarte J.H., Jimenez-Cervantes C, Garcia-Borron J.C. ? Lozano J.A. 
Biochem. Biophys. Res. Commun. 204:1243-1250(1994). 
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[ 8]Cary J.W., Lax A.R., Flurkey W.H. Plant Mol. Biol. 20:245-253(1992). 

749. (Mur Ligase) Folylpolyglutamate synthase signatures 

5 Folylpolyglutamate synthase (EC 6.3.2.17) (FPGS) [1] is the enzyme of folate metabolism 
that catalyzes ATP-dependent addition of glutamate moieties to tetrahydrofolate. 

Its sequence is moderately conserved between prokaryotes (gene folC) and eukaryotes. 
We developed two signature patterns based on the conserved regions which are rich in 
1 0 glycine residues and could play a role in the catalytical 
activity and/or in substrate binding. 

Description of pattern(s) and/or profile(s) 

Consensus pattemft^VMmiLlVMFY SEP ID NO: l.8)i-x-|feP^[LIVM SEP ID NO:4 )|- 
1 5 fSTA^f ST A G SEP ID NO:20)j-G-T-[NK]-G-K-x-[ST]-x(7)- feP/Mj| LIVM SEP ID 
NQ;4>](2)-x(3)-[GSK] 
Consensus pattemfWV*^ 

NO:4)1 -rGAI-G-x(2VD-x4GST]-x-[^^ jL^VM SEP ID NO:4)] (2) 

2 0 [ l]Shane B., Garrow T., Brenner A., Chen L., Choi Y.J., Hsu J.C., Stover P. Adv. Exp. Med. 

Biol. 338:629-634(1993). 

750. (Peptidase M3) Neutral zinc metallopeptidases, zinc-binding region signature 
25 The majority of zinc-dependent metallopeptidases (with the notable exception of the 

carboxypeptidases) share a common pattern of primary structure [1,2,3] in the part of their 
sequence involved in the binding of zinc, and can be grouped together as a 
superfamily,known as the metzincins, on the basis of this sequence similarity. They can be 
classified into a number of distinct families [4,E1] which are listed below along with the 

3 0 proteases which are currently known to belong to these families. 

Family Ml 

- Bacterial aminopeptidase N (EC 3.4.1 1.2) (gene pepN). 
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- Mammalian aminopeptidase N (EC 3.4.1 1.2). 

- Mammalian glutamyl aminopeptidase (EC 3.4.1 1.7) (aminopeptidase A). It may play a 
role in regulating growth and differentiation of early B-lineage cells. 

- Yeast aminopeptidase yscll (gene APE2). 

5 - Yeast alanine/arginine aminopeptidase (gene AAP1). 

- Yeast hypothetical protein YIL137c. 

- Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is responsible for the hydrolysis of 
an epoxide moiety of LTA-4 to form LTB-4; it has been shown that it binds zinc and is 
capable of peptidase activity. 

10 

Family M2 

- Angiotensin-converting enzyme (EC 3.4.15.1) (dipeptidyl carboxypeptidase I) (ACE) the 
enzyme responsible for hydrolyzing angiotensin I to angiotensin II. There are two forms 
of ACE: a testis-specific isozyme and a somatic isozyme which has two active centers. 

15 

Family M3 

- Thimet oligopeptidase (EC 3.4.24.15), a mammalian enzyme involved in the cytoplasmic 
degradation of small peptides. 

- Neurolysin (EC 3.4.24.16) (also known as mitochondrial oligopeptidase M or microsomal 
2 0 endopeptidase). 

- Mitochondrial intermediate peptidase precursor (EC 3.4.24.59) (MIP). It is involved the 
second stage of processing of some proteins imported in the mitochondrion. 

- Yeast saccharolysin (EC 3.4.24.37) (proteinase yscD). 

-Escherichia coli and related bacteria dipeptidyl carboxypeptidase (EC 3.4.15.5) (gene 
25 dcp). 

- Escherichia coli and related bacteria oligopeptidase A (EC 3.4.24.70) (gene opdA or prlC). 

- Yeast hypothetical protein YKL134c. 

Family M4 

30 - Thermostable thermolysins (EC 3.4.24.27), and related thermolabile neutral proteases 
(bacillolysins) (EC 3.4.24.28) from various species of Bacillus. 

- Pseudolysin (EC 3.4.24.26) from Pseudomonas aeruginosa (gene lasB). 

- Extracellular elastase from Staphylococcus epidermidis. 
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- Extracellular protease prtl from Erwinia carotovora. 

- Extracellular minor protease smp from Serratia marcescens. 

- Vibriolysin (EC 3.4.24.25) from various species of Vibrio. 

- Protease prtA from Listeria monocytogenes. 

5 - Extracellular proteinase proA from Legionella pneumophila. 

Family M5 

- Mycolysin (EC 3.4.24.31) from Streptomyces cacaoi. 
10 Family M6 

- Immune inhibitor A from Bacillus thuringiensis (gene ina). Ina degrades two classes of 
insect antibacterial proteins, attacins and cecropins. 

Family M7 

15 - Streptomyces extracellular small neutral proteases 
Family M8 

- Leishmanolysin (EC 3.4.24.36) (surface glycoprotein gp63), a cell surface protease from 
various species of Leishmania. 

20 

Family M9 

- Microbial collagenase (EC 3.4.24.3) from Clostridium perfringens and Vibrio 
alginolyticus. 

2 5 Family Ml OA 

- Serralysin (EC 3.4.24.40), an extracellular metalloprotease from Serratia. 

- Alkaline metal loproteinase from Pseudomonas aeruginosa (gene aprA). 

- Secreted proteases A, B, C and G from Erwinia chrysanthemi. 

- Yeast hypothetical protein YIL108w. 

30 

Family Ml OB 

- Mammalian extracellular matrix metalloproteinases (known as matrixins) [5]: MMP-1 (EC 
3.4.24.7) (interstitial collagenase), MMP-2 (EC 3.4.24.24) (72 Kd gelatinase), MMP-9 (EC 
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3.4.24.35) (92 Kd gelatinase), MMP-7 (EC 3.4.24.23) (matrylisin), MMP-8 (EC 3.4.24.34) 
(neutrophil collagenase), MMP-3 (EC 3.4.24.17) (stromelysin-1), MMP- 10 (EC 3.4.24.22) 
(stromelysin-2), and MMP-1 1 (stromelysin-3), MMP- 12 (EC 3.4.24.65) (macrophage 
metalloelastase). 

- Sea urchin hatching enzyme (envelysin) (EC 3.4.24.12). A protease that allows the 
embryo to digest the protective envelope derived from the egg extracellular matrix. 

- Soybean metalloendoproteinase 1 . 

Family Mil 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE). 
Family M12A 

- Astacin (EC 3.4.24.21), a crayfish endoprotease. 

- Meprin A (EC 3.4.24.18), a mammalian kidney and intestinal brush border 
metalloendopeptidase. 

- Bone morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone 
formation and which expresses metalloendopeptidase activity. The Drosophila homolog 
of BMP-1 is the dorsal-ventral patterning protein tolloid. 

-Blastula protease 10 (BP 10) from Paracentrotus lividus and the related protein SpAN 
from Strongylocentrotus purpuratus. 

- Caenorhabditis elegans protein toh-2. 

- Caenorhabditis elegans hypothetical protein F42A10.8. 

- Choriolysins L and H (EC 3.4.24.67) (also known as embryonic hatching proteins LCE 
and HCE) from the fish Oryzias lapides. These proteases participates in the breakdown 
of the egg envelope, which is derived from the egg extracellular matrix, at the time of 
hatching. 

Family Ml 2B 

- Snake venom metalloproteinases [6]. This subfamily mostly groups proteases that act in 
hemorrhage. Examples are: adamalysin II (EC 3.4.24.46), atrolysin C/D (EC 
3.4.24.42), atrolysin E (EC 3.4.24.44), fibrolase (EC 3.4.24.72), trimerelysin I (EC 
3.4.25.52) and II (EC 3.4.25.53). 

- Mouse cell surface antigen MS2. 
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Family Ml 3 

- Mammalian neprilysin (EC 3.4.24.1 1) (neutral endopeptidase) (NEP). 

- Endothelin-converting enzyme 1 (EC 3.4.24.71) (ECE-1), which process the precursor of 
5 endothelin to release the active peptide. 

- Kell blood group glycoprotein, a major antigenic protein of erythrocytes. The Kell protein 
is very probably a zinc endopeptidase. 

- Peptidase O from Lactococcus lactis (gene pepO). 

10 Family M27 

- Clostridial neurotoxins, including tetanus toxin (TeTx) and the various botulinum toxins 
(BoNT). These toxins are zinc proteases that block neurotransmitter release by 
proteolytic cleavage of synaptic proteins such as synaptobrevins, syntaxin and SNAP-25 
[7,8]. 

15 

Family M30 

- Staphylococcus hyicus neutral metalloprotease. 
Family M32 

20 - Thermostable carboxypeptidase 1 (EC 3.4.17.19) (carboxypeptidase Taq), an enzyme 
from Thermus aquaticus which is most active at high temperature. 

Family M34 

-Lethal factor (LF) from Bacillus anthracis, one of the three proteins composing the 
2 5 anthrax toxin. 

Family M35 

- Deuterolysin (EC 3.4.24.39) from Penicillium citrinum and related proteases from various 
species of Aspergillus. 

30 

Family M36 

- Extracellular elastinolytic metalloproteinases from Aspergillus. 
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From the tertiary structure of thermolysin, the position of the residues acting as zinc 
ligands and those involved in the catalytic activity are known. Two of the zinc ligands are 
histidines which are very close together in the sequence; C-terminal to the first histidine is 
a glutamic acid residue which acts as a nucleophile and promotes the attack of a water 
5 molecule on the carbonyl carbon of the substrate. A signature pattern which includes the 
two histidine and the glutamic acid residues is sufficient to detect this superfamily of 
proteins. 

Description of pattern(s) and/or profile(s) 
1 0 Consensus pattem[GSTAi;i\^H|'GSTALl\ r N r SEP ID NO:679)j -x(2)-H-E- 

|^P/-MF¥WG^ [The 
two H's are zinc ligands] [E is the active site residue] 
Sequences known to belong to this class detected by the patternALL, 
1 5 except for members of families M5, M7 amd Mil. 

Other sequence(s) detected in SWISS-PROT55; including Neurospora 
crassa conidiation-specific protein 1 3 which could be a 
zinc-protease. 

[ l]Jongeneel C.V., Bouvier J., Bairoch A. 
2 0 FEBS Lett. 242:211-214(1989). 

[ 2]Murphy G.J.P., Murphy G., Reynolds JJ. 
FEBS Lett. 289:4-7(1991). 

[ 3]Bode W., Grams F. 5 Reinemer P., Gomis-Rueth F.-X., Baumann U. 5 McKay 
D.B., Stoecker W. 

2 5 Zoology 99:237-246(1996). 

[ 4]Rawlings N.D., Barrett A.J. 
Meth. Enzymol. 248:183-228(1995). 
[ 5]Woessner J. Jr. 
FASEB J. 5:2145-2154(1991). 

3 0 [ 6]Hite L.A., Fox J.W., Bjarnason J.B. 

[ 7]Montecucco C. 5 Schiavo G. 

Trends Biochem. Sci. 18:324-327(1993). 

[ 8]Niemann H., Blasi J., Jahn R. 
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Trends Cell Biol. 4:179-185(1994). 

751. PseudoU_synt_l 

5 tRNA pseudouridine synthase is involved in the formation of pseudouridine at the anticodon 
stem and loop of transfer-RNAs Pseudouridine is an isomer of uridine (5-(beta-D- 
ribofiiranosyl) uracil, and id the most abundant modified nucleoside found in all cellular 
RNAs. The TruA-like proteins also exhibit a conserved sequence with a strictly conserved 
aspartic acid, likely involved in catalysis. Number of members: 25 

10 

[l]Medline: 98254513. Transfer RNA-pseudouridine synthetase Pusl of Saccaromyces 
cerevisiae contains one atom of zinc essential for its native conformation and tRNA 
recognition. Arluison V, Hountondji C 5 Robert B, Grosjean H; Biochemistry 1998;37:7268- 
7276. 

15 

752. EPSP synthase signatures 

EPSP synthase (3-phosphoshikimate 1-carboxyvinyltransferase) (EC 2.5.1.19) catalyzes the 
sixth step in the biosynthesis from chorismate of the aromatic amino acids (the shikimate 
2 0 pathway) in bacteria (gene aroA), plants and fungi (where it is part of a multifunctional 

enzyme which catalyzes five consecutive steps in this pathway) [1]. EPSP synthase has been 
extensively studied as it is the target of the potent herbicide glyphosate which inhibits the 
enzyme. 

2 5 The sequence of EPSP from various biological sources shows that the structure of the enzyme 

has been well conserved throughout evolution. Two conserved regions were selected as 
signature patterns. The first pattern corresponds to a region that is part of the active site and 
which is also important for the resistance to glyphosate [2]. The second pattern is located in 
the C-terminal part of the protein and contains a conserved lysine which seems to be 

3 0 important for the activity of the enzyme. 



Description of pattern(s) and/or profile(s) 
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Consensus pattern[WMj{LI\ 7 M SEP II) NO:4)]>x(2)-[GN]-N-[SA1-G-T-rSTAVx>R>x> 

pj\q^ [LTVMY SEP ID NO: 143 )1 -x-fGSTAI [GSTA SEP ID NO: 1 9)1 

Consensus pattem[KR]-x-[KH]-E-[CSTHDNE>^^ SEP ID NO:4) !-x-fSTAl- 

5 [KRA]4W\^F}D^^ 

[ l]Stallings W.C., Abdel-Megid S.S., Lim L.W., Shieh H.-S., Dayringer H.E., Leimgruber 
N.K., Stegeman R.A., Anderson K.S., Sikorski J.A., Padgette S.R., Kishore G.M. Proc. 
Natl. Acad. Sci. U.S.A. 88:5046-5050(1991). 
10 [ 2]Padgette S.R., Re D.B., Gaser C.S., Eicholtz D.A., Frazier R.B., Hironaka CM., Levine 
E.B., Shah D.M., Fraley R.T., Kishore G.M. J. Biol. Chem. 266:22364-22369(1991). 

753. Glyco_hydro_18 

15 Glycosyl hydrolases family 18. Number of members: 173 

[l]Medline: 95219379. Crystal structure of a bacterial chitinase at 2.3 A resolution. Perrakis 
A, Tews I 5 Dauter Z 9 Oppenheim AB, Chet I, Wilson KS> Vorgias CE; Structure 
1994;2:1169-1180. 

20 

754. Esterase 
Putative esterase 

This family contains Esterase D Swiss:P10768. However it is not clear if all members of the 
family have the same function. This family is possibly related to the COesterase family. 
2 5 Number of members: 36 

755. (HMA) Heavy-metal-associated domain 

A conserved domain of about 30 amino acid residues has been found [1] in a number of 
30 proteins that transport or detoxify heavy metals. This domain contains two conserved 
cysteines that could be involved in the binding of these metals. The domain has been 
termed Heavy-Metal-Associated (HMA). It has been found in: 
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- A variety of cation transport ATPases (E1-E2 ATPases) (see <PDOC00139>). The 
human copper ATPAses ATP7A and ATP7B which are respectively involved in 
Menke's and Wilson's diseases. ATP7A and ATP7B both contain 6 tandem copies of the 
HMA domain. The copper ATPases CCC2 from budding yeast, copA from 
5 Enterococcus faecalis and synA from Synechococcus contain one copy of the HMA 

domain. The cadmium ATPases cadA from Bacillus firmus and from plasmid pI258 
from Staphylococcus aureus also contain a single HMA domain, while a chromosomal 
Staphylococcus aureus cadA contains two copies. Other, less characterized ATPases 
that contain the HMA domain are: fixl from Rhizobium meliloti, pacS from 
1 0 Synechococcus strain PCC 7942), Mycobacterium leprae ctpA and ctpB and 

Escherichia coli hypothetical protein yhhO. In all these ATPases the HMA domain(s) 
are located in the N-terminal section. 

Mercuric reductase (EC 1.16.1.1) (gene merA) which is generally encoded by plasmids 
carried by mercury-resistant Gram-negative bacteria. Mercuric reductase is a class- 1 

1 5 pyridine nucleotide-disulphide oxidoreductase (see <PDOC00073>). There is 

generally one HMA domain (with the exception of a chromosomal merA from 
Bacillus strain RC607 which has two) in the N-terminal part of merA. 
Mercuric transport protein periplasmic component (gene merP), also encoded by 
plasmids carried by mercury-resistant Gram-negative bacteria. It seems to be a 

2 0 mercury scavenger that specifically binds to one Hg(2+) ion and which passes it to 

the mercuric reductase via the merT protein. The N-terminal half of merP is a HMA 
domain. 

Helicobacter pylori copper-binding protein copP. 

Yeast protein ATX 1 [2], which could act in the transport and/or partitioning of 

2 5 copper. 

The consensus pattern for HMA spans the complete domain. 

Description of pattern(s) and/or profile(s) 

3 0 Consensus pattem{MVN4[ LIVN SE P ID NO:682 Vl-x(2)-fLm4f^HLiVMFA SEP ID 

NQ:81)J-x-C-x-fSTAG€DN«^STAGCDNll SEP ID NO: 68 3Yl-C-x(3VfLI-WQ4fLIVFG 
SEQJD NQ: 684V|-x(3)-[LIV]-x(9 5 1 1 )-[I VA]-x-{- tVF¥-S^L VFYS SEQ. ID JN O MM [The 
two Cs probably bind metals] 
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[ l]Bull P.C. 5 Cox D.W. Trends Genet. 10:246-252(1994). 

[ 2]Lin S.-J. ? Culotta V.L. Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 

5 756. (Peptidase Ml 0) Matrixins cysteine switch 

PROSITE cross-reference(s): CYSTEINE_SWITCH 

Mammalian extracellular matrix metalloproteinases (EC 3.4.24.-), also known as matrixins 
[1] (see <PDOC00129>), are zinc-dependent enzymes. They are secreted by cells in an 
inactive form (zymogen) that differs from the mature enzyme by the presence of an N- 

1 0 terminal propeptide. A highly conserved octapeptide is found two residues downstream of 
the C-terminal end of the propeptide. This region has been shown to be involved in 
autoinhibition of matrixins [2,3]; a cysteine within the octapeptide chelates the active site 
zinc ion, thus inhibiting the enzyme. This region has been called the Cysteine switch* or 
'autoinhibitor region'. 

15 A cysteine switch has been found in the following zinc proteases: 



- MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24.17) (stromelysin-1). 
2 0 - MMP-7 (EC 3.4.24.23) (matrilysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-10 (EC 3.4.24.22) (stromelysin-2). 

- MMP-1 1 (EC 3.4.24.-) (stromelysin-3). 

2 5 - MMP-1 2 (EC 3.4.24.65) (macrophage metalloelastase). 

- MMP-1 3 (EC 3.4.24.-) (collagenase 3). 

- MMP-14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1). 

- MMP-1 5 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 

- MMP-1 6 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 
30 - Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 

- Chiamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 



Description of pattern(s) and/or profile(s) 
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Consensus patternP-R-C-[GN]-x-P-[DW [C 

chelates the zinc ion] 

[ l]Woessner J. Jr. FASEB J. 5:2145-2154(1991). 
5 [ 2] Sanchez-Lopez R., Nicholson R., Gesnel M.C., Matrisian L.M., Breathnach R. J. Biol. 
Chem. 263:11892-11899(1988). 

[ 3]Park A.J., Matrisian L.M., Kells A.F., Pearson R., Yuan Z., Navre M. J. Biol. Chem. 
266:1584-1590(1991). 

[ 4]Lepage T., Gache C. EMBO J. 9:3003-3012(1990). 
10 [ 5]Kinoshita T., Fukuzawa H. 5 Shimada T. 5 Saito T., Matsuda Y. Proc. Natl. Acad. Sci. 
U.S.A. 89:4693-4697(1992). 



757. (Peptidase S8) Serine proteases, subtilase family, active sites 
1 5 PROSITE cross-reference(s): PS00136; SUBTILASE_ASP, PS00137; SUBTILASE_HIS, 
PS00138; SUBTIL AS ESER 

Subtilases [1,2] are an extensive family of serine proteases whose catalytic activity is 
provided by a charge relay system similar to that of the trypsin family of serine proteases 
but which evolved by independent convergent evolution. The sequence around the 
2 0 residues involved in the catalytic triad (aspartic acid, serine and histidine) are completely 
different from that of the analogous residues in the trypsin serine proteases and can be 
used as signatures specific to that category of proteases. 
The subtilase family currently includes the following proteases: 

- Subtilisins (EC 3.4.21.62), these alkaline proteases from various Bacillus species have 
2 5 been the target of numerous studies in the past thirty years. 

- Alkaline elastase YaB from Bacillus sp. (gene ale). 

- Alkaline serine exoprotease A from Vibrio alginolyticus (gene pro A). 

- Aqualysin I from Thermus aquaticus (gene pstl). 

- AspA from Aeromonas salmonicida. 

30 - Bacillopeptidase F (esterase) from Bacillus subtilis (gene bpf). 

- C5A peptidase from Streptococcus pyogenes (gene scpA). 

- Cell envelope-located proteases PI, PII, and PHI from Lactococcus lactis. 

- Extracellular serine protease from Serratia marcescens. 
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- Extracellular protease from Xanthomonas campestris. 

- Intracellular serine protease (ISP) from various Bacillus. 

- Minor extracellular serine protease epr from Bacillus subtilis (gene epr). 

- Minor extracellular serine protease vpr from Bacillus subtilis (gene vpr). 
5 - Nisin leader peptide processing protease nisP from Lactococcus lactis. 

- Serotype-specific antigene 1 from Pasteurella haemolytica (gene ssal). 

- Thermitase (EC 3.4.21.66) from Thermoactinomyces vulgaris. 

- Calcium-dependent protease from Anabaena variabilis (gene prcA). 

- Halolysin from halophilic bacteria sp. 172pl (gene hly). 

10 - Alkaline extracellular protease (AEP) from Yarrowia lipolytica (gene xpr2). 

- Alkaline proteinase from Cephalosporium acremonium (gene alp). 

- Cerevisin (EC 3.4.21.48) (vacuolar protease B) from yeast (gene PRB1). 

- Cuticle-degrading protease (prl) from Metarhizium anisopliae. 

- KEX-1 protease from Kluyveromyces lactis. 

1 5 - Kexin (EC 3.4.21.61) from yeast (gene KEX-2). 

- Oryzin (EC 3.4.21.63) (alkaline proteinase) from Aspergillus (gene alp). 

- Proteinase K (EC 3.4.21.64) from Tritirachium album (gene proK). 

- Proteinase R from Tritirachium album (gene proR). 

- Proteinase T from Tritirachium album (gene proT). 
2 0 - Subtilisin-like protease III from yeast (gene YSP3). 

- Thermomycolin (EC 3.4.21.65) from Malbranchea sulfurea. 

- Furin (EC 3.4.21.85), neuroendocrine convertases 1 to 3 (NEC-1 to -3) and PACE4 
protease from mammals, other vertebrates, and invertebrates. These proteases are involved 
in the processing of hormone precursors at sites comprised of pairs of basic amino acid 

2 5 residues [3]. 

- Tripeptidyl-peptidase II (EC 3.4.14.10) (tripeptidyl aminopeptidase) from Human. 

- Prestalk-specific proteins tagB and tagC from slime mold [4]. Both proteins consist of two 
domains: a N-terminal subtilase catalytic domain and a C-terminal ABC transporter domain 
(see <PDOC00185>). 

30 

Description of pattern(s) and/or profile(s) 
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Consensus pattem{^A4¥}fST Ar V SEP ID NP:130)]-x-fWMFtf LrVMF SEP ID NP:2)] - 
pyWM HLIVM SEP ID NP:4V} -D-fD££A) jDSTA SEP ID NO:686Yl -G- 
I •■WMFGjf LIVMFC SE P ID NP:90)l-x(2,3HDNH] [D is the active site residue] 
Consensus patternH-G-rSTMl-x-CVICl-fg^-AQG^fSTAGC SEP ID NO:45VKGS]-x- 
5 tfcm4AjLLP/N^ 

fSAQMjfSAGM SEP ID NP:688)| [H is the active site residue] 

Consensus pattemG-T-S-x-[SA]-x-P-x(2)-tS^V€4rSTAVC SEP ID NO:505)]-fAG1 [S is 
the active site residue] 

Note if a protein includes at least two of the three active site signatures, the probability of it 
10 being a serine protease from the subtilase family is 100% 

Note these proteins belong to family S8 in the classification of 
peptidases [5,E1]. 

[ l]Siezen R.J., de Vos W.M., Leunissen J.A.M., Dijkstra B.W. Protein Eng. 4:719- 
15 737(1991). 

[ 2]Siezen R.J. (In) Proceeding subtilisin symposium, Hamburg, (1992). 
[3]Barr P.J. Cell 66:1-3(1991). 

[ 4]Shaulsky G., Kuspa A., Loomis W.F.; Genes Dev. 9:1 1 1 1-1 122(1995). 
[ 5]Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

20 

758. (SSB) Single-strand binding protein family signatures 
PRGSITE cross-reference(s): PS00735; SSB_1,PS00736; SSB_2 

The Escherichia coli single-strand binding protein [1] (gene ssb), also known as the helix- 
25 destabilizing protein, is a protein of 177 amino acids. It binds tightly, as a homotetramer, to 
single-stranded DNA (ss-DNA) and plays an important role in DNA replication, 
recombination and repair. 

Closely related variants of SSB are encoded in the genome of a variety of large self- 
30 transmissible plasmids. SSB has also been characterized in bacteria such as Proteus mirabilis 
or Serratia marcescens. 
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Eukaryotic mitochondrial proteins that bind ss-DNA and are probably involved in 
mitochondrial DNA replication are structurally and evolutionary related to prokaryotic SSB. 
Proteins currently known to belong to this subfamily are listed below [2]. 

- Mammalian protein Mt-SSB (PI 6). 
5 - Xenopus Mt-SSBs and Mt-SSBr. 

- Drosophila MtSSB. 

- Yeast protein RIM1 . 

Two signature patterns have been developed for these proteins. The first is a conserved 
1 0 region in the N-terminal section of the SSB's. The second is a centrally located region which, 
in Escherichia coli SSB, is known to be involved in the binding of DNA. 

Description of pattern(s) and/or profile(s) 

Consensus pattemfrtV^4P}[l.JVMF S EP ID N O:2 )]-[NST]-[KRT]H:MVM-jfLI VM SEP ID 
15 NQ:4)|-x-|-«VMF-irL IV l yl F SEP ID NQ:2Vl(2)-G-[NWRK4rN HRK SEP ID NO:68»)1- 
(tiVM][UVM.SEQiDNO:4.)j- [GST]-x-[DET] 
Consensus pattemT-x-W-[HY]-[RNS]4«^ 

SEP TP NO:2)] -[FY]-{^9feR4 [NGKR SEP ID NO:690 )} 

2 0 [ lJMeyer R.R., Laine P.S. Microbiol. Rev. 54:342-380(1990). 

[ 2]Stroumbakis N.D., Li Z„ Tolias P.P. Gene 143:171-177(1994). 

759. KDPG and KHG aldolases active site signatures 

PROSITE cross-reference(s): PS00159; ALDOL ASE KDPG KHG 1 , PS00160; 

2 5 ALDOLASE_KDPG__KHG__2 

4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) (KHG-aldolase) catalyzes the 
interconversion of 4-hydroxy-2-oxoglutarate into pyruvate and glyoxylate. Phospho-2- 
dehydro-3-deoxy gluconate aldolase (EC 4.1.2.14) (KDPG-aldolase) catalyzes the 

3 0 interconversion of 6-phospho-2-dehydro-3-deoxy-D-gluconate into pyruvate and 

glyceraldehyde 3-phosphate. 
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These two enzymes are structurally and functionally related [1]. They are both homotrimeric 
proteins of approximately 220 amino-acid residues. They are class I aldolases whose catalytic 
mechanism involves the formation of a Schiff-base intermediate between the substrate and 
the epsilon-amino group of a lysine residue. In both enzymes, an arginine is required for 
5 catalytic activity. 

Two signature patterns were developed for these enzymes. The first one contains the active 
site arginine and the second, the lysine involved in the Schiff-base formation. 

1 0 Description of pattern(s) and/or profile(s) 

Consensus pattemG-[WM][LIVM SEP ID NO:4)hx(3VE-[LIV]-T-[LF]-R [R is the active 
site residue] 

Consensus pattemG-x(3)-H4VMF}[ LTV MF SEP ID NO:2)l-K-rLF1-F-P-fSA1-x(3)-G [K is 
involved in Schiff-base formation] 

15 

[ 1] VlahosC J., Dekker E.E. J. Biol. Chem. 263:11683-11691(1988). 

760. AP endonucleases family 1 signatures. PROSITE cross-reference(s): PS00726; 
AP_NUCLEASE_F1_1, PS00727; AP_NUCLEASE_F1_2 ? PS00728; 
2 0 APNUCLE ASEF 1 3 

DNA damaging agents such as the antitumor drugs bleomycin and neocarzinostatin or those 
that generate oxygen radicals produce a variety of lesions in DNA. Amongst these is base- 
loss which forms apurinic/apyrimidinic (AP) sites or strand breaks with atypical 3 'termini. 
2 5 DNA repair at the AP sites is initiated by specific endonuclease cleavage of the 

phosphodiester backbone. Such endonucleases are also generally capable of removing 
blocking groups from the 3 'terminus of DNA strand breaks. 

AP endonucleases can be classified into two families on the basis of sequence similarity. 
30 Family 1 groups the enzymes listed below [1]. 

- Escherichia coli exonuclease III (EC 3.1.11.2) (gene xthA). 

- Streptococcus pneumoniae and Bacillus subtilis exonuclease A (gene exoA). 
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- Mammalian AP endonuclease 1 (API) (EC 4.2.99.18). 

- Drosophila recombination repair protein 1 (gene Rrpl). 

- Arabidopsis thaliana apurinic endonuclease-redox protein (gene arp). 

5 Except for Rrpl and arp ? these enzymes are proteins of about 300 amino-acid residues. 
Rrpl and arp both contain additional and unrelated sequences in their N-terminal section 
(about 400 residues for Rrpl and 270 for arp). 

Three signature patterns were developed for this family of enzymes. The patterns are based 
10 on the most conserved regions. The first pattern contains a glutamate which has been 
shown [2], in the Escherichia coli enzyme to bind a divalent metal ion such as magnesium or 
manganese 

Consensus pattem[APF]-D-HrtVMF}[LJV MF SE P ID NO:2 )l(2Vx-pJVM|[ LIVM SEP ID 
1 5 NO:4V[-Q-E-x-K [E binds a divalent metal ion] 

Consensus patternD-[ST]-[FY]-R-[KH]-x(7,8)-[FYW]-[ST]-[FYW](2) 
Consensus patternN-x-G-x-R-|3^¥^ 
SEP ID NO:541 Vl -x-IXVI-x-S 

20 [1] Barzilay G., Hickson I.S. BioEssays 17:713-719(1995). 

[ 2] Mol CD., Kuo C.-F., Thayer MM, Cunningham R.P., Tainer J.A. Nature 374:381- 
386(1995). 

761. (ER)Enhancer of rudimentary signature, PROSITE cross-reference(s): PS01290; ER 

25 

The Drosophila protein 'enhancer of rudimentary' (gene (e(r)) is a small protein of 104 
residues whose function is not yet clear. From an evolutionary point of view, it is highly 
conserved [1] and has been found to exist in probably all multicellular eukaryotic 
organisms. It has been proposed that this protein plays a role in the cell cycle. 

30 

A conserved region in the central part of the protein was selected as as signaure pattern. 



Consensus pattern Y-D-I-[SA]-x-L-[FY]-x-F-[IV]-D-x(3)-D-[LIV]-S 



Reference No. 2750-942P 



638 

[ 1] Gelsthorpe M., Pulumati M., McCallum C, Dang-Vu K. ? Tsubota S.I. Gene 186:189- 
195(1997). 

762. (ETF alpha) Electron transfer flavoprotein alpha-subunit signature, PROSITE cross- 
reference(s): PS00696; ETF_ALPHA 

The electron transfer flavoprotein (ETF) [1,2] serves as a specific electron acceptor for 
various mitochondrial dehydrogenases. ETF transfers electrons to the main respiratory 
chain via ETF-ubiquinone oxidoreductase. ETF is an heterodimer that consist of an alpha 
and a beta subunit and which bind one molecule of FAD per dimer. A similar system also 
exists in some bacteria. 

The alpha subunit of ETF is a protein of about 32 Kd which is structurally related to the 
bacterial nitrogen fixation protein fixB which could play a role in a redox process and feed 
electrons to ferredoxin. 

Other related proteins are: 

- Escherichia coli hypothetical protein ydiR. 

- Escherichia coli hypothetical protein ygcQ. 

A highly conserved region which is located in the C-terminal section was selected as a 
signature pattern for these proteins. 

Consensus pattern [LI]^Y-|^M411J.VM SEO TP NO:4Yl -rAT1-x-G-riV1-rSD1-G-x-riV1-Q- 
H-x(2)-G-x(6)-[IV]-x-A-[IV]-N 

[ 1] Finocchiaro G. ? Ikeda Y., Ito M., Tanaka K. Prog. Clin. Biol. Res. 321:637-652(1990). 
[ 2] Tsai M.H., Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 

763. (lectin c) C-type lectin domain signature and profile 
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PROSITE cross-reference(s): PS00615; C_TYPE_LECTIN_1, PS50041; 
C TYPE LECTIN 2 



10 



A number of different families of proteins share a conserved domain which was first 
characterized in some animal lectins and which seem to function as a calcium-dependent 
carbohydrate-recognition domain [1,2,3]. This domain, which is known as the C-type lectin 
domain (CTL) or as the carbohydrate-recognition domain (CRD), consists of about 1 1 0 to 
130 residues. There are four cysteines which are perfectly conserved and involved in two 
disulfide bonds. A schematic representation of the CTL domain is shown below. 



+ + 

I I 

xcxxxxcxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxWxCxxxxCx 
1 5 +----+ + + 



'C: conserved cysteine involved in a disulfide bond. 
V: optional cysteine involved in a disulfide bond, 
position of the pattern. 

The categories of proteins, in which the CTL domain has been found, are listed below. 



Type-II membrane proteins where the CTL domain is located at the C-terminal extremity of 
the proteins: 

25 

- Asialoglycoprotein receptors (ASGPR) (also known as hepatic lectins) [4]. The ASGPR's 
mediate the endocytosis of plasma glycoproteins to which the terminal sialic acid residue 
in their carbohydrate moieties has been removed. 

- Low affinity immunoglobulin epsilon Fc receptor (lymphocyte IgE receptor), which plays 
30 an essential role in the regulation of IgE production and in the differentiation of B cells. 

- Kupffer cell receptor. A receptor with an affinity for galactose and fucose, that could 
be involved in endocytosis. 
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- A number of proteins expressed on the surface of natural killer T-cells: NKG2, NKR-P1, 
YE1/88 (Ly-49), CD69 and on B-cells: CD72, LyB-2. The CTL- domain in these proteins is 
distantly related to other CTL-domains; it is unclear whether they are likely to bind 
carbohydrates. 

5 

Proteins that consist of an N-terminal collagenous domain followed by a CTL- domain [5], 
these proteins are sometimes called 'collectins': 

- Pulmonary surfactant-associated protein A (SP-A). SP-A is a calcium- 
1 0 dependent protein that binds to surfactant phospholipids and contributes to 

lower the surface tension at the air-liquid interface in the alveoli of the 
mammalian lung. 

- Pulmonary surfactant-associated protein D (SP-D). 

- Conglutinin, a calcium-dependent lectin-like protein which binds to a yeast 

1 5 cell wall extract and to immune complexes through the complement component 
(iC3b). 

- Mannan-binding proteins (MBP) (also known as mannose-binding proteins). 
MBP's bind mannose and N-acetyl-D-glucosamine in a calcium-dependent 
manner. 

2 0 - Bovine collectin-43 (CL-43). 

Selectins (or LEC-CAM) [6,7]. Selectins are cell adhesion molecules implicated in the 
interaction of leukocytes with platelets or vascular endothelium. Structurally, selectins 
consist of a long extracellular domain, followed by a transmembrane region and a short 
2 5 cytoplasmic domain. The extracellular domain is itself composed of a CTL-domain, 
followed by an EGF-like domain and a variable number of SCR/Sushi repeats. Known 
selectins are: 

- Lymph node homing receptor (also known as L-selectin, leukocyte adhesion 
30 molecule-l ? (LAM-1), leu-8, gp90-mel, or LECAM-1) 

- Endothelial leukocyte adhesion molecule 1 (ELAM-1, E-selectin or LECAM-2). 
The ligand recognized by ELAM-1 is sialyl-Lewis x. 

- Granule membrane protein 140 (GMP-140, P-selectin, PADGEM, CD62, or LECAM- 
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3). The ligand recognized by GMP-140 is Lewis x. 

Large proteoglycans that contain a CTL-domain followed by one copy of a SCR/ Sushi 
repeat, in their C-terminal section: 

5 

- Aggrecan (cartilage-specific proteoglycan core protein). This proteoglycan 
is a major component of the extracellular matrix of cartilagenous tissues 
where it has a role in the resistance to compression. 

- Brevican. 
10 - Neurocan. 

- Versican (large fibroblast proteoglycan), a large chondroitin sulfate 
proteoglycan that may play a role in intercellular signalling. 

In addition to the CTL and Sushi domains, these proteins also contain, in their N-terminal 
1 5 domain, an Ig-like V-type region, two or four link domains (see <PDOC00955>) and up to 
two EGF-like repeats. 

Two type-I membrane proteins: 

2 0 - Mannose receptor from macrophages. This protein mediates the endocytosis of 
glycoproteins by macrophages in several recognition and uptake processes. 
Its extracellular section consists of a fibronectin type II domain followed 
by eight tandem repeats of the CTL domain. 

- 180 Kd secretory phospholipase A2 receptor (PLA2-R). A protein whose 
2 5 structure is highly similar to that of the mannose receptor. 

- DEC-205 receptor. This protein is used by dendritic cells and thymic 
epithelial cells to capture and endocytose diverse carbohydrate-binding 
antigens and direct them to antigen-processing cellular compartiments. DEC- 
205 extracellular section consists of a fibronectin type II domain followed 

30 by ten tandem repeats of the CTL domain. 

- Silk moth hemocytin, an humoral lectin which is involved in a self-defence 
mechanism. It is composed of 2 FA58C domains (see <PDOC00988>), a CTL 
domain, 2 VWFC domains (see <PDOC00928), and a CTCK (see <PDOC00912>). 



Reference No. 2750-942P 



642 

Various other proteins that uniquely consist of a CTL domain: 

- Invertebrate soluble galactose-binding lectins. A category to which belong 
5 a humoral lectin from a flesh fly; echinoidin, a lectin from the coelomic 

fluid of a sea urchin; BRA-2 and BRA-3, two lectins from the coelomic fluid 
of a barnacle, a lectin from the tunicate Polyandrocarpa misakiensis and a 
newt oviduct lectin. The physiological importance of these lectins is not 
yet known but they may play an important role in defense mechanisms. 
10 - Pancreatic stone protein (PSP) (also known as pancreatic thread protein 
(PTP) ? or reg), a protein that might act as an inhibitor of spontaneous 
calcium carbonate precipitation. 

- Pancreatitis associated protein (PAP), a protein that might be involved in 
the control of bacterial proliferation. 

15 - Tetranectin, a plasma protein that binds to plasminogen and to isolated 
kr ingle 4. 

- Eosinophil granule major basic protein (MBP), a cytotoxic protein. 

- A galactose specific lectin from a rattlesnake. 

- Two subunits of a coagulation factor IX/factor X-binding protein (IX/X-bp), 
2 0 a snake venom anticoagulant protein which binds with factors IX and X in 

the presence of calcium. 

- Two subunits of a phospholipase A2 inhibitor from the plasma of a snake 
(PLI-A and PLI-B). 

- A lipopolysaccharide-binding protein (LPS-BP) from the hemolymph of a 
2 5 cockroach [8]. 

- Sea raven antifreeze protein (AFP) [9]. 

As a signature pattern for this domain, the C-terminal region with its three conserved 
cysteines was selected. 

30 

Consensus P attemC-fLP^F¥ATGi [LIVMFYATG SEP ID NO:691 )]-x(5J 2)-[WL]-x- 
fDNS&HDNSR SEP ID NO:692)J-x(2>C-x(5,6)- 
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NQ:433Y j-C [The three C's are involved in disulfide 
bonds] 

Note all CTL domains have five Trp residues before the second Cys, 
with the exception of tunicate lectin and cockroach LPS-BP which 
have Leu. 

Note this documentation entry is linked to both a signature pattern 
and a profile. As the profile is much more sensitive than the 
pattern, you should use it if you have access to the necessary 
software tools to do so. 

[ 1] Drickamer K. J. Biol. Chem. 263:9557-9560(1988). 

[2] Drickamer K. Prog. Nucleic Acid Res. Mol. Biol. 45:207-232(1993). 

[ 3] Drickamer K. Curr. Opin. Struct. Biol. 3:393-400(1993). 

[ 4] Spiess M. Biochemistry 29:10009-10018(1990). 

[ 5] Weis W.L, Kahn R., Fourme R. 5 Drickamer K. ? Hendrickson W.A. Science 254:1608- 
1615(1991). 

[6] SiegelmanM. Curr. Biol. 1:125-128(1991). 

[ 7] Lasky L.A. Science 238:964-969(1992). 

[ 8] Jomori T., Natori S. J. Biol. Chem. 266:13318-13323(1991). 

[ 9] Ng N.F.L., Hew C.-L. J. Biol. Chem. 267:16069-16075(1992). 

764. (SRCR) Speract receptor repeated domain signature 
PROSITE cross-reference(s): PS00420; SPERACT_RECEPTOR, 

The receptor for the sea urchin egg peptide speract is a transmembrane glycoprotein of 
500 amino acid residues [1]. Structurally it consists of a large extracellular domain of 450 
residues, followed by a transmembrane region and a small cytoplasmic domain of 12 amino 
acids. The extracellular domain contains four repeats of a 1 15 amino acids domain. There are 
1 7 positions that are perfectly conserved in the four repeats, among them are six cysteines, 
six glycines, and three glutamates. 
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Such a domain is also found, once, in the C-terminal section of mammalian macrophage 
scavenger receptor type I [2], a membrane glycoproteins implicated in the pathologic 
deposition of cholesterol in arterial walls during atherogenesis. 

5 The signature pattern that was derived spans part of the N-terminal section of the domain and 
contains 8 of the 1 7 conserved residues. 

Consensus patternG-x(5)-G-x(2)-E-x(6)-W-G-x(2)-C-x(3)-[FYW]-x(8)-C-x(3)-G 

10 [1] Dangott J.J., Jordan J.E., Bellet R.A., Garbers D.L. Proc. Natl. Acad. Sci. U.S.A. 
86:2128-2132(1989). 

[ 2] Freeman M., Ashkenas J., Rees D.J., Kingsley D.M., Copeland N.G., Jenkins N.A., 
Krieger M. Proc. Natl. Acad. Sci. U.S.A. 87:8810-8814(1990). 

15 765. Bac__surface_Ag 

Bacterial surface antigen 

This entry includes the following surface antigens; D15 antigen from H.influenzae, OMA87 
from P.multocida, OMP85 from N. meningitidis and N. gonorrhoeae. Number of members: 
14 

20 

[l]Medline: 95255676. The sequencing of the 80-kDa D15 protective surface antigen of 
Haemophilus influenzae. Flack FS, Loosmore S, Chong P, Thomas WR; Gene 1995;156:97- 
99. 

[2] Medline: 96333354. Cloning, sequencing, expression, and protective capacity of the 

2 5 oma87 gene encoding the Pasteurella multocida 87-kilodalton outer membrane antigen. 

Ruffolo CG, Adler B; Infect Immun 1996;64:3161-3167. 

766. BRCA1 C Terminus (BRCT) domain 

The BRCT domain is found predominantly in proteins involved in cell cycle checkpoint 

3 0 functions responsive to DNA damage. It has been suggested that the Retinoblastoma protein 

contains a divergent BRCT domain, this has not been included in this family. The BRCT 
domain of XRCC1 forms a homodimer in the crystal structure Medline:99016060. This 
suggests that pairs of BRCT domains 
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associate as homo- or heterodimers. Number of members: 131 

[1] Medline: 96259550. BRCA1 protein products ...Functional motifs... Koonin EV, Altschul 
SF, Bork P; Nature Genet 1996;13:266-268. 
5 [2] Medline: 97153217. From BRCA1 to RAP1: A widespread BRCT module closely 
associated with DNA repair Callebaut I, Mornon JP; Febs lett 1997;400:25-30. 
[3] Medline: 97186552. A superfamily of conserved domains in DNA damage responsive cell 
cycle checkpoint proteins Bork P, Hofmann K, Bucher P, Neuwald AF, Altschul SF, Koonin 
EV; Faseb J 1997;11:68-76. 
1 0 [4] Medline: 97402527. Gapped BLAST and PSI-BLAST: a new generation of protein 

database search programs. Altschul SF ? Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller 
W, Lipman DJ; Nucleic Acids Res 1997;25:3389-3402. 

[5] Medline: 99016060. Structure of an XRCC1 BRCT domain: a new protein-protein 
interaction module. Zhang X, Morera S, Bates PA, Whitehead PC, Coffer AI, Hainbucher K, 
1 5 Nash RA, Sternberg MJ, Lindahl T, Freemont PS; 

767. Kappa casein 

Kappa-casein is a mammalian milk protein involved in a number of important physiological 
processes. In the gut, the ingested protein is split into an insoluble peptide (para kappa- 
2 0 casein) and a soluble hydrophilic glycopeptide (caseinomacropeptide). Caseinomacropeptide 
is responsible for increased efficiency of digestion, prevention of neonate hypersensitivity to 
ingested proteins, and inhibition of gastric pathogens. Number of members: 56 

[1] Medline: 98072500. Nucleotide sequence evolution at the kappa-casein locus: evidence 
2 5 for positive selection within the family Bovidae. Ward TJ, Honeycutt RL, Derr JN; Genetics 
1997;147:1863-1872. 

768. Chitinases family 18 active site 
PROSITE cross-reference(s) CHITINASE_18 

30 Chitinases (EC 3.2.1.14) [1] are enzymes that catalyze the hydrolysis of the beta-l,4-N- 
acetyl-D-glucosamine linkages in chitin polymers. From the view point of sequence 
similarity chitinases belong to either family 18 or 19 in the classification of glycosyl 
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hydrolases [2,E1]. Chitinases of family 18 (also known as classes III or V) groups a variety 
of proteins: 

a) Chitinases from: 

5 - Prokaryotes such as Alteromonas, Bacillus, Serratia, Streptomyces, etc. 

- Plants such as Arabidopsis, cucumber, bean, tobacco, etc. 

- Fungi such as Aphanocladium, Rhizopus, Saccharomyces, etc. 

- Nematode (Brugia malayi). 

- Insects (Manduca sexta). 

1 0 - Baculoviruses (Autographa Californica Nuclear Polyhedrosis virus). 

b) Other proteins: 

- Hevamine, a rubber tree protein with chitinase and lysozyme activities. 
15 - Kluyveromyces lactis killer toxin alpha subunit, which acts as a chitinase. 

- Flavobacterium and Streptomyces endo-beta-N-acetylglucosaminidases (EC 3.2.1.96). 
-Mammalian di-N-acetylchitobiase which is involved in the degradation of asparagine- 
linked glycoproteins. 

- Human cartilage glycoprotein Gp-39. 

20 - Jack bean concanavalin B (conB), a protein that has lost its catalytic activity. 

Site directed mutagenesis experiments [3] and crystallographic data [4,5] have shown that a 
conserved glutamate is involved in the catalytic mechanism and probably acts as a proton 
donor. This glutamate is at the extremity of the best conserved region in these proteins. 

25 

Consensus pattern [LI V MFYj [LIVM FY SEP TP NQ:l8)1 -rDN1-G-ftr^MFf [I,IVNfF SEQ ID 
NO:2B-[DNH [E is the active site residue] 

[ 1] Flach J., Pilet P.-E., Jolles P. Experientia 48:701-716(1992). 
30 [ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 3] Watanabe T., Kohori K., Miyashita K., Fujii T., Sakai H., Uchida M. ? Tanaka H. J. Biol. 
Chem. 268:18567-18572(1993). 
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[ 4] Perrakis A., Tews I., Dauter Z., Oppenheim A.B., Chet I., Wilson K.S., Vorgias C.E. 
Structure 2:1169-1180(1994). 

[ 5] van Scheltinga A.C.T., Kalk K.H., Beintema J J., Dijkstra B.W. Structure 2:1 1 Si- 
ll 89(1994). 

5 

769. gag_pl7. gag gene protein pi 7 (matrix protein). 

The matrix protein forms an icosahedral shell associated with the inner membrane of the 
mature immunodeficiency virus. Number of members: 1598 

10 [1] Medline: 95055757. Three-dimensional structure of the human immunodeficiency virus 
type 1 matrix protein. Massiah MA, Starich MR, Paschall C, Summers MF, Christensen AM, 
Sundquist WI; J Mol Biol 1994;244:198-223. 

770. GDA1/CD39 family of nucleoside phosphatases signature 
1 5 PROSITE cross-reference(s); GDA1_CD39_NTPASE 

A number of nucleoside diphosphate and triphosphate hydrolases as well as some yet 
uncharacterized proteins have been found to belong to the same family [1,2]. This family 
currently consist of: 

- Yeast guanosine-diphosphatase (EC 3.6.1.42) (GDPase) (gene GDA1). GDA1 is a golgi 
2 0 integral membrane enzyme that catalyzes the hydrolysis of GDP to GMP. 

- Potato apyrase (EC 3.6.1.5) (adenosine diphosphatase) (ADPase). Apyrase acts on both 
ATP and ADP to produce AMP. 

-Mammalian vascular ATP-diphosphohydrolase (EC 3.6.1.5) (also known as lymphoid 
cell activation antigen CD39). 
2 5 - Toxoplasma gondii nucleoside-triphosphatases (EC 3.6.1.15) (NTPase). NTPase 

hydrolyses various nucleoside triphosphates to produce the corresponding nucleoside 
mono- and diphosphates. This enzyme is secreted into the invaded host cell into the 
parasitophorous vacuole, a specialized compartment where the parasite intracellulary 
resides. 

30 - Pea nucleoside-triphosphatases (EC 3.6.1.15) (NTPase). 

- Caenorhabditis elegans hypothetical protein C33H5.14. 

- Caenorhabditis elegans hypothetical protein R07E4.4. 
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- Yeast chromosome V hypothetical protein YER005w. 

The above uncharacterized proteins all seem to be membrane-bound. 

5 All these proteins share a number of conserved domains. The best conserved of these 
domains have been selected. It is located in the central section of the 
proteins. 

Consensus pattem^Vjy-l}[LIVM SEP ID NO:4)1-x-G-x(2VE-G-x-rFY>x-rFW1- 

1 0 fWA41LJVA. SEP ID NO:219YI 4TAG1-x-N4HYl 

[ 1] Handa M., Guidotti G. Biochem. Biophys. Res. Commun. 218:916-923(1996). 
[ 2] Vasconcelos E.G., Ferreira S.T., de Carvalho T.M.U., de Souza W., Kettlun A.M., 
Mancilla M, Valenzuela M.A., Verjovski-Almeida S. J. Biol. Chem. 271:22139- 
15 22145(1996). 

771. GTP cyclohydrolase I signatures 

PROSITE cross-reference(s); GTP_CYCLOHYDROL_l_l, GTP_CYCLOHYDROL_l_2 
GTP cyclohydrolase I (EC 3.5.4.16) catalyzes the biosynthesis of formic acid and 

2 0 dihydroneopterin triphosphate from GTP. This reaction is the first step in the biosynthesis of 

tetrahydrofolate in prokaryotes, of tetrahydrobiopterin in vertebrates, and of pteridine- 
containing pigments in insects. 

GTP cyclohydrolase I is a protein of from 190 to 250 amino acid residues. The comparison 
25 of the sequence of the enzyme from bacterial and eukaryotic sources shows that the 

structure of this enzyme has been extremely well conserved throughout evolution [1]. 

Two conserved regions were selected as signature patterns. The first contains a perfectly 
conserved tetrapeptide which is part of the GTP-binding pocket [2], the second region also 

3 0 contains conserved residues involved in GTP-binding. 



Consensus pattem[DEN]-|-M-VM]{^ 

NO:694)]-[DEN]-fI^VM}[LIVM SEQ ID NQ:4)l-x(3HSTl-x-C-E- H-H 
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Consensus pattem[SA]-x-[RK]-x-QH™ 

[ 1] Maier J., Witter K., Guetlich M., Ziegler I., Werner T., Ninnemann H. Biochem. 
Biophys. Res. Commun. 212:705-711(1995). 
5 [ 2] Nar H., Huber R. ? Meining W., Schmid C, Weinkauf S., Bacher A. Structure 3:459- 
466(1995). 

772. II vC. Acetohydroxy acid isomeroreductase 

Acetohydroxy acid isomeroreductase catalyses the conversion of acetohydroxy acids into 
1 0 dihydroxy valerates. This reaction is the second in the synthetic pathway of the essential 
branched side chain amino acids valine and isoleucine. Number of members: 29 

[1] Medline: 97361822. The crystal structure of plant acetohydroxy acid isomeroreductase 
complexed with NADPH, two magnesium ions and a herbicidal transition state analog 
1 5 determined at 1 .65 A resolution. Biou V 5 Dumas R, Cohen- Addad C, Douce R ? Job D, Pebay- 
Peyroula E; EMBO J 1997;16:3405-3415. 

773. Prokaryotic membrane lipoprotein lipid attachment site 
PROSITE cross-reference(s); PROKAR_LIPOPROTEIN 

20 In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, 
which is cleaved by a specific lipoprotein signal peptidase (signal peptidase II). The 
peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to which 
a glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see [1,2,3]): 

25 - Major outer membrane lipoprotein (murein-lipoproteins) (gene lpp). 

- Escherichia coli lipoprotein-28 (gene nip A). 

- Escherichia coli lipoprotein-34 (gene nlpB). 

- Escherichia coli lipoprotein nlpC. 

- Escherichia coli lipoprotein nlpD. 

30 - Escherichia coli osmotically inducible lipoprotein B (gene osmB). 

- Escherichia coli osmotically inducible lipoprotein E (gene osmE). 

- Escherichia coli peptidogly can-associated lipoprotein (gene pal). 

- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 
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- Escherichia coli copper homeostasis protein cutF (or nlpE). 

- Escherichia coli plasmids traT proteins. 

- Escherichia coli Col plasmids lysis proteins. 

- A number of Bacillus beta-lactamases. 

5 - Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 

- Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 

- Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). 

- Chlamydia trachomatis outer membrane protein 3 (gene omp3). 

- Fibrobacter succinogenes endoglucanase cel-3. 
10 - Haemophilus influenzae proteins Pal and Pep. 

- Klebsiella pullulunase (gene pulA). 

- Klebsiella pullulunase secretion protein pulS. 

- Mycoplasma hyorhinis protein p37. 

- Mycoplasma hyorhinis variant surface antigens A, B, and C (genes vlpABC). 
15 - Neisseria outer membrane protein H.8. 

- Pseudomonas aeruginosa lipopeptide (gene IppL). 

- Pseudomonas solanacearum endoglucanase egl. 

- Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 

- Rickettsia 1 7 Kd antigen. 

2 0 - Shigella flexneri invasion plasmid proteins mxiJ and mxiM. 

- Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 

- Treponema pallidium 34 Kd antigen. 

- Treponema pallidium membrane protein A (gene tmpA). 

- Vibrio harveyi chitobiase (gene chb). 
25 - Yersinia virulence plasmid protein yscJ. 

- Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper- binding 
protein. This is the first archaebacterial protein known to be modified in such a fashion). 

3 0 From the precursor sequences of all these proteins, we derived a consensus pattern and a 

set of rules to identify this type of post-translational modification. 
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Consensus pattem{©ERK£^ 

[T J V MF W S TAG | f II VMFWSTAG SEP ID NQ:352)1 (2)- 

fM¥M¥¥S^AGGO-][lJ.VM F YS T AGCQ SEP ID NO:353)HAGS1-C [C is the lipid 
attachment site] Additional rules: 1) The cysteine must be between positions 15 and 35 of the 
5 sequence in consideration. 2) There must be at least one Lys or one Arg in the first seven 
positions of the sequence. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 

[ 2]Klein P. ? Somorjai R.L., Lau P.C.K. Protein Eng. 2:15-20(1988). 
10 [ 3]von Heijne G. Protein Eng. 2:531-534(1989). 

[ 4]Mattar S. ? Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D., Engelhard M. J. Biol. 
Chem. 269:14939-14945(1994). 

774. Aminoacyl-transfei RNA synthetases class-II signatures 
1 5 PROSITE cross-reference(s); AA_TRNA_LIGASE JI_1 ; AA_TRNA_LIGASE_II_2 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate 
amino acids and transfer them to specific tRNA molecules as the first step in protein 
biosynthesis. In prokaryotic organisms there are at least twenty different types of 
aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are 
2 0 generally two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic 

form and a mitochondrial form. While all these enzymes have a common function, they are 
widely diverse in terms of subunit size and of quaternary structure. 

The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, 
2 5 phenylalanine, proline, serine, and threonine are referred to as class-II synthetases [2 to 6] 
and probably have a common folding pattern in their catalytic domain for the binding of 
ATP and amino acid which is different to the Rossmann fold observed for the class I 
synthetases [7]. 

30 Class-II tRNA synthetases do not share a high degree of similarity, however at least three 

conserved regions are present [2,5,8]. Signature patterns from two of these regions have been 
derived. 
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Consensus pattern[FYH]-R-x-[DE]-x(4, 1 2)-[RH]-x(3)-F-x(3)-[DE] 

Consensus pattern[G STA .i J V F ] |GSTAI..VF SEP JD NO:42)l - fDENQHRKP) (DENOHRKP 
SEP ID NO:43)i -fGS'PA4 rOSTA SEP ID NO:19 )|-fM-¥MF} jLlVMF SEP ID NO:2 ij -fDEI- 
R-fL£VM 
5 flUP¥MF¥ftllYM^ 

[ l]Schimmel P. Annu. Rev. Biochem. 56:125-158(1987). 
[ 2]Delarue M., Moras D. BioEssays 15:675-687(1993). 
[ 3]Schimmel P. Trends Biochem. Sci. 16:1-3(1991). 

[ 4]Nagel G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
[ 5]Cusack S., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991). 
[ 6]Cusack S. Biochimie 75:1077-1081(1993). 

[ 7]Cusack S., Berthet-Colominas C, Haertlein M., Nassar N., Leberman R. Nature 347:249- 
255(1990). 

[ 8]Leveque F., Plateau P., Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990). 

775. X. Trans-activation protein X 

This protein is found in hepadnaviruses where it is indispensable for replication. Number of 
members: 91 

20 

776. Thymidylate synthase active site 

Thymidylate synthase (EC 2.1.1.45) [1,2] catalyzes the reductive methylation of 
dUMP to dTMP with concomitant conversion of 5,10-methylenetetrahydrofolate to 
dihydrofolate. Thymidylate synthase plays an essential role in DNA synthesis and is an 

2 5 important target for certain chemo therapeutic drugs. 

Thymidylate synthase is an enzyme of about 30 to 35 Kd in most species except in 
protozoan and plants where it exists as a bifunctional enzyme that includes a dihydrofolate 
reductase domain. 

A cysteine residue is involved in the catalytic mechanism (it covalently binds the 5,6- 

3 0 dihydro-dUMP intermediate). The sequence around the active site of this enzyme is 

conserved from phages to vertebrates. 



10 



15 
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Consensus patternR^x(2V[fel¥M3[L^ M SEP ID NO:4)1-x(3)-rFW1>rQN1-x(8,9)-rLVl-x-P- 
C-fHAVM ^HAVM SEP ID NO:695)1 -x(3)-rQMT1-rFYW1-x-rLV1 [C is the active site 
residue] 

5 [ 1] Benkovic S.J. Annu. Rev. Biochem. 49:227-251(1980). 

[ 2] Ross P., P'Gara F. ? Condon S. Appl. Environ. Microbiol. 56:2156-2163(1990). 

777. Glycosyl hydrolases family 3 1 signatures 

It has been shown [1,2,3,E1] that the following glycosyl hydrolases can be, on the 
10 basis of sequence similarities, classified into a single family: 

- Lysosomal alpha-glucosidase (EC 3.2.1.20) (acid maltase) is a vertebrate glycosidase 
active at low pH, which hydrolyzes alpha(l->4) and alpha(l->6) linkages in glycogen, 
maltose, and isomaltose. 

- Alpha-glucosidase (EC 3.2.1.20) from the yeast Candida tsukunbaensis. 

15 -Alpha-glucosidase (EC 3.2.1.20) (gene malA) from the archebacteria Sulfolobus 
solfataricus. 

- Intestinal sucrase-isomaltase (EC 3.2.1.48 / EC 3.2.1.10) is a vertebrate membrane-bound, 
multifunctional enzyme complex which hydrolyzes sucrose, maltose and isomaltose. The 
sucrase and isomaltase domains of the enzyme are homologous (41% of amino acid identity) 

2 0 and have most probably evolved by duplication. 

- Glucoamylase 1 (EC 3.2.1.3) (glucan 1 ,4-alpha-glucosidase) from various fungal species. 

- Yeast hypothetical protein YBR229c. 

- Fission yeast hypothetical protein SpAC30Dl 1 .01c. 

An aspartic acid has been implicated [4] in the catalytic activity of sucrase, 

2 5 isomaltase, and lysosomal alpha-glucosidase. The region around this active residue is highly 

conserved and can be used as a signature pattern. A second region, which contains two 
conserved cysteines, has been used as an additional signature pattern. 

Consensus pattern [GF]~P^¥MF}[L JVMF SEP ID NO:2> 1-W-x-D-M-rNSAl-E [D is the 

3 0 active site residue] 

Consensus pattern G-[AV]-D4«VM^A : }[LIVMrA S EP TP NO: 3 1 1Vj-C-G-rFY1-x(3)-fSTl- 
x(3)-L-C-x-R-W-x(2)-[LV]-[GSA]-[SA]-F-x-P-F-x-R-[DN] 
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[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Kinsella B.T., Hogan S. 9 Larkin A., Cantwell B.A. Eur. J. Biochem. 202:657-664(1991). 
[ 3] Nairn H.Y., Niemann T., Kleinhans U., Hollenberg CP., Strasser A.W.M. FEBS Lett. 
294:109-112(1991). 

5 [ 4] Hermans M.M.P., Kroos M.A., van Beeumen J., Oostra B.A., Reuser A J.J. J. Biol. 
Chem. 266:13507-13512(1991). 

778. Urease signatures 

Urease (EC 3.5.1 .5) is a nickel-binding enzyme that catalyzes the hydrolysis of urea 
10 to carbon dioxide and ammonia [1]. Historically, it was the first enzyme to be crystallized (in 
1926). It is mainly found in plant seeds, microorganisms and invertebrates. In plants, urease 
is a hexamer of identical chains. In bacteria [2], it consists of either two or three different 
subunits (alpha, beta and gamma). 

Urease binds two nickel ions per subunit; four histidine, an aspartate and a 
15 carbamated-lysine serve as ligands to these metals; an additional histidine is involved in the 
catalytic mechanism [3]. 

As signatures for this enzyme, a region was selected that contains two histidine that 
bind one of the nickel ions and the region of the active site histidine. 

2 0 Consensus pattern T-[AY]-[GA]-[GAT]4«¥M4[L1VM SEP I D NP:4)1-D-x-H- 
ff^¥^4}[ LIVM SEP ID NP:4T I-H-x(3VP [The two H's bind nickel] 

Consensus pattern ^VMj[LJ VM SEP ID NO:4) ](2V[CT]-H-[l^]-L-x(3)-fLP^4j[I JVM 
S EP ID NP:4V) -x(2)-D^vI¥MI( LlVM SEP ID NP:4)]-x-F-A [H is the active site residue] 

25 [1] Takishima K., Suga T., Mamiya G. Eur. J. Biochem. 175:151-165(1988). 
[ 2] Mobley H.L.T., Husinger R.P. Microbiol. Rev. 53:85-108(1989). 
[ 3] Jabri E., Carr M.B., Hausinger R.P., Karplus P.A. Science 268:998-1004(1995). 

779. Tyrosine specific protein phosphatases signature and profiles 

30 Tyrosine specific protein phosphatases (EC 3.1.3.48) (PTPase) [1 to 5] are enzymes 

that catalyze the removal of a phosphate group attached to a tyrosine residue. These enzymes 
are very important in the control of cell growth, proliferation, differentiation and 
transformation. Multiple forms of PTPase have been characterized and can be classified into 
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two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase 
domain(s). The currently known PTPases are listed below: 

Soluble PTPases. 
5 -PTPN1 (PTP-1B). 

- PTPN2 (T-cell PTPase; TC-PTP). 

- PTPN3 (HI ) and PTPN4 (MEG), enzymes that contain an N-terminal band 4. 1 - like 
domain (see <PDOC00566>) and could act at junctions between the membrane and 
cytoskeleton. 

10 -PTPN5 (STEP). 

- PTPN6 (PTP-1C; HCP; SHP) and PTPN1 1 (PTP-2C; SH-PTP3; Syp), enzymes which 
contain two copies of the SH2 domain at its N-terminal extremity. The Drosophila protein 
corkscrew (gene csw) also belongs to this subgroup. 

- PTPN7 (LC-PTP; Hematopoietic protein-tyrosine phosphatase; HePTP). 
15 - PTPN8 (70Z-PEP). 

- PTPN9 (MEG2). 

- PTPN12 (PTP-G1; PTP-P19). 

- Yeast PTP1. 

- Yeast PTP2 which may be involved in the ubiquitin-mediated protein degradation 
2 0 pathway. 

- Fission yeast pypl and pyp2 which play a role in inhibiting the onset of mitosis. 

- Fission yeast pyp3 which contributes to the dephosphorylation of cdc2. 

- Yeast CDC 14 which may be involved in chromosome segregation. 

- Yersinia virulence plasmid PTPAses (gene yopH). 

2 5 - Autographa californica nuclear polyhedrosis virus 19 Kd PTPase. 

Dual specificity PTPases. 

- DUSP1 (PTPN10; MAP kinase phosphatase- 1; MKP-1); which dephosphorylates MAP 
kinase on both Thr-183 and Tyr-185. 

30 - DUSP2 (PAC-1), a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 
on both Thr and Tyr residues. 

- DUSP3 (VHR). 

- DUSP4 (HVH2). 
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- DUSP5 (HVH3). 
-DUSP6 (Pystl;MKP-3). 

- DUSP7 (Pyst2; MKP-X). 

- Yeast MSGS, a PTPase that dephosphorylates MAP kinase FUS3. 
5 - Yeast YVH1. 

- Vaccinia virus HI PTPase; a dual specificity phosphatase. 

Receptor PTPases. 

Structurally, all known receptor PTPases, are made up of a variable length 

1 0 extracellular domain, followed by a transmembrane region and a C-terminal catalytic 

cytoplasmic domain. Some of the receptor PTPases contain fibronectin type III (FN-III) 
repeats, immunoglobulin-like domains, MAM domains or carbonic anhydrase-like domains 
in their extracellular region. The cytoplasmic region generally contains two copies of the 
PTPAse domain. The first seems to have enzymatic activity, while the second is inactive but 

1 5 seems to affect substrate specificity of the first. In these domains, the catalytic cysteine is 
generally conserved but some other, presumably important, residues are not. 

In the following table, the domain structure of known receptor PTPases is shown: 

2 0 Extracellular Intracellular 



Ig FN-3 CAH MAM PTPase 
Leukocyte common antigen (LCA) (CD45) 0 2 0 0 2 



Leukocyte antigen related 


(LAR) 


3 


8 


0 0 


Drosophila DLAR 


3 


9 


0 


0 2 


Drosophila DPTP 


2 


2 


0 


0 2 


PTP-alpha (LRP) 


0 


0 


0 


0 2 


PTP-beta 


0 16 


0 


0 


1 


PTP-gamma 


0 


1 


1 0 2 


PTP-delta 


0 >7 


0 


0 


2 


PTP-epsilon 


0 0 


0 


0 


2 


PTP-kappa 


1 4 


0 


1 


2 
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PTP-mu 14 0 12 

PTP-zeta 0 110 2 

PTPase domains consist of about 300 amino acids. There are two conserved cysteines, 
the second one has been shown to be absolutely required for activity. Furthermore, a number 
5 of conserved residues in its immediate vicinity have also been shown to be important. 

A signature pattern was derived for PTPase domains centered on the active site 
cysteine. 

There are three profiles for PTPases, the first one spans the complete domain and is 
not specific to any subtype. The second profile is specific to dual-specificity PTPases and the 
1 0 third one to the PTP subfamily. 

Consensus pattern {WMFMLIVME.^ 

EBgAQPflSTA GP SEP ID NO:213 >l-x 4LIVMFY1 [U. VM FY SEP ID NO: 1 8)1 [C is the 
active site residue] 

1 5 Notethe M-phase inducer phosphatases (cdc25-type phosphatase) are tyrosine- protein 

phosphatases that are not structurally related to the above PTPases. 

Notethis documentation entry is linked to both a signature pattern and to profiles. As 
profiles are much more sensitive than the pattern, you should use them if you have access to 
the necessary software tools to do so. 

20 

[ 1] Fischer E.H., Charbonneau H., Tonks N.K. Science 253:401-406(1991). 
[ 2] Charbonneau H., Tonks N.K. Annu. Rev. Cell Biol. 8:463-493(1992). 
[ 3] Trowbridge I.S. J. Biol. Chem. 266:23517-23520(1991). 
[ 4] Tonks N.K., Charbonneau H. Trends Biochem. Sci. 14:497-500(1989). 
25 [ 5] Hunter T. Cell 58:1013-1016(1989). 

780. Connexins signatures 

Gap junctions [1] are specialized regions of the plasma membrane which consist of 
closely packed pairs of transmembrane channels, the connexons, through which small 
3 0 molecules diffuse from a cell to a neighboring cell. Each connexon is composed of an 

hexamer of an integral membrane protein which is often referred to as connexin. In a given 
species there are a number of different, yet structurally related, tissue specific, forms of 
connexins. The types of connexins which are currently known are listed below. 
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- Connexin 


56 


(Cx56). 




- Connexin 


50 


(Cx50) 


(lens fiber protein MP70). 


- Connexin 


46 


(Cx46) 


(alpha-3). 


- Connexin 


45 


(Cx45) 


(alpha-6). 


- Connexin 


43 


(Cx43) 


(alpha- 1). 


- Connexin 


40 


(Cx40) 


(alpha-5). 


- Connexin 


38 


(Cx38) 


(alpha-2). 


- Connexin 


37 


(Cx37) 


(alpha-4). 


- Connexin 


33 


(Cx33) 


(alpha-7). 


- Cnnnpxin 


32 


(Cx32) 


[beta-1). 


- Connexin 


31. 


1 (Cx31.r 


) (beta-4). 


- Connexin 


31 


(Cx31) 


(beta-3). 


- Connexin 


30.3 (Cx30.3; 


) (beta-5). 


- Connexin 


26 


(Cx26) 


(beta-2). 



Structurally the connexins consist of a short cytoplasmic N-terminal domain, followed 
by four transmembrane segments that delimit two extracellular and one cytoplasmic loops; 
the C-terminal domain is cytoplasmic and its length is variable (from 20 residues in Cx26 to 
260 residues in Cx56). The schematic representation of this structure is shown below. 
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The sequences of the two extracellular loops are well conserved. In both loops there 
are three conserved cysteines which are involved in disulfide bonds. A signature patterns 
from each of these two loop regions has been built. 
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Consensus patternC-[DN]-T-x-Q-P-G-C-x(2)-V-C-[FY]-D [The three C's are involved in 
disulfide bonds] Consensus pattemC-x(3.4)-P-C-x(3)-{jyP^f LIVM SEP ID NQ:4)I - 
[DEN]-C-[FY]^VMjf[LIVM SEP I D NO:4y j-[SAHKR]-P [The three C's are involved in 
disulfide bonds] 

5 

[ 1] Goodenough D.A., Goliger J.A., Paul D.L. Annu. Rev. Biochem. 65:475-502(1996). 

78 1 . Gram-positive cocci surface proteins 'anchoring 1 hexapeptide 

Surface proteins from Gram-positive cocci contains a conserved hexapeptide located a 
1 0 few residues downstream of a hydrophobic C-terminal membrane anchor region which is 

followed by a cluster of basic amino acids [1]. This structure is represented in the following 
schematic representation: 

+ — +-+ +-+ 

15 | Variable length extracellular domain |H| Anchor |B| 
+ +-+ +-+ 

'H': conserved hexapeptide. 
'B': cluster of basic residues. 

20 It has been proposed that this hexapeptide sequence is responsible for a post-translational 

modification necessary for the proper anchoring of the proteins which bear it, to the cell wall. 
Proteins known to contain such hexapeptide are listed below: 

- Aggregation substance from streptococcus faecalis (asal). 

- C5a peptidase from Streptococcus pyogenes (scpA). 

2 5 - C protein alpha-antigen from Streptococcus agalactiae (bca). 

- Cell surface antigen I/II (PAC) from Streptococcus mutans. 

- Dextranase from Streptococcus downei (dex). 

- Fibronectin-binding protein from Staphylococcus aureus (fnbA). 

- Fimbrial subunits from Actinomyces naeslundii and viscosus. 
30 - IgA binding protein from Streptococcus pyogenes (arp4). 

- IgA binding protein (B antigen) from Streptococcus agalactiae (bag). 

- IgG binding proteins from Streptococci and Staphylococcus aureus. 

- Internalin A from Listeria monocytogenes (inlA). 
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- M proteins from streptococci. 

- Muramidase-released protein from Streptococcus suis (mrp). 

- Nisin leader peptide processing protease from Lactococcus lactis (nisP). 

- Protein A from Staphylococcus aureus. 

5 - Trypsin-resistant surface T protein from streptococci. 

- Wall-associated protein from Streptococcus mutans (wapA). 

- Wall -associated serine proteinases from Lactococcus lactis. 

Consensus pattemL-P-x-T-G-{STQA^>S}[STGAV.DE SEP ID NO:696>] 

10 

[ 1] Schneewind O., Jones K.F., Fischetti V.A. J. Bacteriol. 172:3310-3317(1990). 

782. Gamma- glutamyl transpeptidase signature 

Gamma-glutamyltranspeptidase (EC 2.3.2.2) (GGT) [1] catalyzes the transfer of the 

1 5 gamma-glutamyl moiety of glutathione to an acceptor that may be an amino acid, a peptide or 
water (forming glutamate). GGT plays a key role in the gamma-glutamyl cycle, a pathway 
for the synthesis and degradation of glutathione. In prokaryotes and eukaryotes, it is an 
enzyme that consists of two polypeptide chains, a heavy and a light subunit, processed from a 
single chain precursor. The active site of GGT is known to be located in the light subunit. 

2 0 The sequences of mammalian and bacterial GGT show a number of regions of high 

similarity [2]. Pseudomonas cephalosporin acylases (EC 3.5.1.-) that convert 7-beta-(4- 
carboxybutanamido)-cephalosporanic acid (GL-7ACA) into 7-aminocephalosporanic acid 
(7ACA) and glutaric acid are evolutionary related to GGT and also show some GGT activity 
[3]. Like GGT, these GL-7ACA acylases, are also composed of two subunits. 

2 5 One of the conserved regions correspond to the N-terminal extremity of the mature 

light chains of these enzymes. This region has been used as a signature pattern. 

Consensus pattemT-[STA]-H-x-[STHW 

[STA]-x-T-x-T-p^m4} [LlVM S E P ID NO:4) 1-[NE1-x(L2)-rFYVG 

30 

[ 1] Tate S.S., Meister A. Meth. Enzymol. 1 13:400-419(1985). 

[ 2] Suzuki H., Kumagai H., Echigo T., Tochikura T. J. Bacteriol. 171:5169-5172(1989). 
[ 3] Ishiye M., Niwa M. Biochim. Biophys. Acta 1 132:233-239(1992). 
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783. Ferrochelatase signature 

Ferrochelatase (EC 4.99.1.1) (protoheme ferro-lyase) [1,2] catalyzes the last step in 
heme biosynthesis: the chelation of a ferrous ion to proto-porphyrin IX, to form protoheme. 
5 In eukaryotes, ferrochelatase is a mitochondrial protein bound to the inner membrane, 

whose active site faces the mitochondrial matrix. The mature form of eukaryotic 
ferrochelatase is composed of about 360 amino acids. In bacteria, ferrochelatase (gene hemH) 
[3] is a protein of from 3 10 to 380 amino acids. 

The human autosomal dominant disease protoporphyria is due to the reduced activity 
10 of ferrochelatase. 

The signature pattern for this enzyme is based on a conserved region which contains a 
histidine residue which could be involved in binding iron. 

Consensus pattemfbtVMF^ n J VM F SEP ID N 0 : 2 ) j(2Vx- [ST] -x-H- f GS1 -fMWvffi LI VM 
15 SEQ,_ro NO S EP ID NO:69?)l-x-G-rDP]-x(l,2)-Y 

[ 1] Labbe-Bois R. J. Biol. Chem. 265:7278-7283(1990). 
[ 2] Brenner D.A., Frasier F. Proc. Natl. Acad. Sci. U.S.A. 88:849-853(1991). 
[ 3] Miyamoto K., Nakahigashi K., Nishimura K., Inokuchi H. J. Mol. Biol. 219:393- 
2 0 398(1991). 

784. Cellulose-binding domain, bacterial type 

The microbial degradation of cellulose and xylans requires several types of enzyme 
such as endoglucanases (EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or 
25 xylanases (EC 3.2.1.8) [1]. 

Structurally, cellulases and xylanases generally consist of a catalytic domain joined 
to a cellulose-binding domain (CBD) by a short linker sequence rich in proline and/or 
hydroxy-amino acids. 

The CBD of a number of bacterial cellulases has been shown to consist of about 105 
30 amino acid residues [2]. Enzymes known to contain such a domain are: 

- Endoglucanase (gene endl) from Butyrivibrio fibrisolvens. 

- Endoglucanases A (gene cenA) and B (cenB) from Cellulomonas fimi. 

- Exoglucanases A (gene cbhA) and B (cbhB) from Cellulomonas fimi. 
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- Endoglucanase E-2 (gene celB) from Thermomonospora fusca. 

- Endoglucanase A (gene celA) from Microbispora bispora. 

- Endoglucanases A (gene celA), B (celB) and C (celC) from Pseudomonas fluorescens. 

- Endoglucanase A (gene eel A) from Streptomyces lividans. 
5 - Exocellobiohydrolase (gene cex) from Cellulomonas fimi. 

- Xylanases A (gene xynA) and B (xynB) from Pseudomonas fluorescens. 

- Arabinofuranosidase C (EC3.2.L55) (xylanase C) (gene xynC) from Pseudomonas 
fluorescens. 

- Chitinase 63 (EC 3.2.1.14) from Streptomyces plicatus. 
10 - Chitinase C from Streptomyces lividans. 

The CBD domain is found either at the N-terminal or at the C-terminal extremity of these 
enzymes. As it is shown in the following schematic representation, there are two conserved 
cysteines in this CBD domain - one at each extremity of the domain - which have been shown 
1 5 [3] to be involved in a disulfide bond. There are also four conserved tryptophan residues 
which could be involved in the interaction of the CBD with polysaccharides. 



2 0 xCxxxxWxxxxxNxxxWxxxxxxxWxxxxxxxxWNxxxxxGxxxxxxxxxxCx 

'C: conserved cysteine involved in a disulfide bond. '*': position of the pattern. 

Consensus pattern W-N-fS : i-AGR]{STAGR 
2 5 NO:699)]-S (imff^LMiFl 
SEP TP NQ:2S2)H GA1 

[ 1] Gilkes N.R., Henrissat B. ? Kilburn D.G., Miller R.C. Jr. 5 Warren R.A.J. Microbiol. Rev. 
55:303-315(1991). 

30 [2] Meinke A., Gilkes N.R., Kilburn D.G., Miller R.C. Jr., Warren R.AJ. Protein Seq. Data 
Anal. 4:349-353(1991). 

[ 3] Gilkes N.R., Claeyssens M. ? Aebersold R. 5 Henrissat B. ? Meinke A., Morrison H.D., 
Kilburn D.G., Warren R.A.J., Miller R.C. Jr. Eur. J. Biochem. 202:367-377(1991). 
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785. Amidases signature 

It has been shown [1,2,3] that several enzymes from various prokaryotic and 
eukaryotic organisms which are involved in the hydrolysis of amides (amidases) are 
5 evolutionary related. These enzymes are listed below. 

- Indoleacetamide hydrolase (EC 3.5.1.-), a bacterial plasmid-encoded enzyme that catalyzes 
the hydrolysis of indole-3-acetamide (I AM) into indole-3 -acetate (I A A), the second step in 
the biosynthesis of auxins from tryptophan. 

- Acetamidase from Emericella nidulans (gene amdS), an enzyme which allows acetamide to 
10 be used as a sole carbon or nitrogen source. 

- Amidase (EC 3.5.1.4) from Rhodococcus sp. N-774 and Brevibacterium sp. R312 (gene 
amdA). This enzyme hydrolyzes propionamides efficiently, and also at a lower efficiency, 
acetamide, acrylamide and indoleacetamide. 

- Amidase (EC 3.5.1.4) from Pseudomonas chlororaphis. 

15 - 6-aminohexanoate-cyclic-dimer hydrolase (EC 3.5.2.12) (nylon oligomers degrading 

enzyme El) (gene nylA), a bacterial plasmid encoded enzyme which catalyzes the first step 
in the degradation of 6-aminohexanoic acid cyclic dimer, a by-product of nylon manufacture 
[4]- 

- Glutamyl-tRNA(Gln) amidotransferase subunit A [5], 
20 - Mammalian fatty acid amide hydrolase (gene FAAH) [6]. 

- A putative amidase from yeast (gene AMD2). 

- Mycobacterium tuberculosis putative amidases amiA2, amiB2, amiC and amiD. 

All these enzymes contain in their central section a highly conserved region rich in glycine, 
2 5 serine, and alanine residues. This region has been used as a signature pattern. 

Consensus pattern: G-[GA]-S-[GS]-[GS]-G-x>[GSA]4GSAW}fGSAVY SEP ID NO: 700)1- 

x-fLWMlUiVM SE^ 
[DE]-x-[GA]-x-S-|W 

30 

[ 1] Mayaux J.-F., Cerbelaud E., Soubrier F. ? Faucher D., Petre D. J. BacterioL 172:6764- 
6773(1990). 
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[ 2] Hashimoto Y. 5 Nishiyama M., Ikehata O., Horinouchi S., Beppu T. Biochim. Biophys. 
Acta 1088:225-233(1991). 

[ 3] Chang T.-H., Abelson J. Nucleic Acids Res. 18:7180-7180(1990). 

[ 4] Tsuchiya K., Fukuyama S., Kanzaki N., Kanagawa K., Negoro S., Okada H. J. Bacteriol. 
5 171:3187-3191(1989). 

[ 5] Curnow A.W., Hong K.W., Yuan R., Kim S.I., Martins O., Winkler W., Henkin T.M., 
Soli D. Proc. Natl. Acad. Sci. U.S.A. 94:11819-11826(1997). 

[ 6] Cravatt B.F., Giang D.K., Mayfield S.P., Boger D.L., Lerner R. A., Gilula N.B. Nature 
384:83-87(1996). 

10 

786. Glycosyl hydrolases family 10 active site 

The microbial degradation of cellulose and xylans requires several types of enzymes 
such as endoglucanases (EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or 
xylanases (EC 3.2.1.8) [1,2]. Fungi and bacteria produces a spectrum of cellulolytic enzymes 
1 5 (cellulases) and xylanases which, on the basis of sequence similarities, can be classified into 
families. One of these families is known as the cellulase family F [3] or as the glycosyl 
hydrolases family 10 [4,E1]. The enzymes which are currently known to belong to this 
family are listed below. 

- Aspergillus awamori xylanase A (xynA). 
2 0 - Bacillus sp. strain 125 xylanase (xynA). 

- Bacillus stearothermophilus xylanase. 

- Butyrivibrio fibrisolvens xylanases A (xynA) and B (xynB). 

- Caldocellum saccharolyticum Afunctional endoglucanase/exoglucanase (celB). This 
protein consists of two domains; it is the N-terminal domain, which has exoglucanase 

2 5 activity, which belongs to this family. 

- Caldocellum saccharolyticum xylanase A (xynA). 

- Caldocellum saccharolyticum ORF4. This hypothetical protein is encoded in the xynABC 
operon and is probably a xylanase. 

- Cellulomonas fimi exoglucanase/xylanase (cex). 

30 - Clostridium stercorarium thermostable celloxylanase. 

- Clostridium thermocellum xylanases Y (xynY) and Z (xynZ). 

- Cryptococcus albidus xylanase. 

- Penicillium chrysogenum xylanase (gene xylP). 
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- Pseudomonas fluorescens xylanases A (xynA) and B (xynB). 

- Ruminococcus flavefaciens bifunctional xylanase XYLA (xynA). This protein consists of 
three domains: a N-terminal xylanase catalytic domain that belongs to family 1 1 of glycosyl 
hydrolases; a central domain composed of short repeats of Gin, Asn an Trp, and a C-terminal 

5 xylanase catalytic domain that belongs to family 10 of glycosyl hydrolases. 

- Streptomyces lividans xylanase A (xlnA). 

- Thermoanaerobacter saccharolyticum endoxylanase A (xynA). 

- Thermoascus aurantiacus xylanase. 

- Thermophilic bacterium Rt8.B4 xylanase (xynA). 

10 

One of the conserved regions in these enzymes is centered on a conserved glutamic acid 
residue which has been shown [5], in the exoglucanase from Cellulomonas fimi, to be 
directly involved in glycosidic bond cleavage by acting as a nucleophile. This region has 
been used as a signature pattern. 

15 

Consensus pattem[GTA]-x(2H«¥N}^ 

NO:701 )1-[ST]-E4LIY]-[DN]-fWMF}[LIVMF SEP ID NO:2)1 [E is the active site residue] 

[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 
2 0 [2] Gilkes N.R., Henrissat B. 5 Kilburn D.G., Miller R.C. Jr., Warren R.AJ. Microbiol. Rev. 
55:303-315(1991). 

[ 3] Henrissat B., Claeyssens M. 5 Tomme P., Lemesle L., Mornon J.-P. Gene 81:83-95(1989). 
[ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 5] Tull D., Withers S.G., Gilkes N.R., Kilburn D.G., Warren R.A.J., Aebersold R. J. Biol. 
25 Chem. 266:15621-15625(1991). 

787. Fructose-bisphosphate aldolase class-II signatures 

Fructose-bisphosphate aldolase (EC 4.1.2.13) [1,2] is a glycolytic enzyme that 
catalyzes the reversible aldol cleavage or condensation of fructose- 1,6- bisphosphate into 
30 dihydroxyacetone-phosphate and glyceraldehyde 3-phosphate. There are two classes of 

fructose-bisphosphate aldolases with different catalytic mechanisms. Class-II aldolases [2], 
mainly found in prokaryotes and fungi, are homodimeric enzymes which require a divalent 
metal ion - generally zinc - for their activity. 
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This family also includes the following proteins: 

- Escherichia coli galactitol operon protein gatY which catalyzes the transformation of 
tagatose 1 ,6-bisphosphate into glycerone phosphate and D- glyceraldehyde 3-phosphate. 
5 - Escherichia coli N-acetyl galactosamine operon protein agaY which catalyzes the same 
reaction as that of gatY. 

As signature patterns for this class of enzyme, two conserved regions were selected. The first 
pattern is located in the first half of the sequence and contains two histidine residues that have 
1 0 been shown [4] to be involved in binding a zinc ion. The second is located in the C-terminal 
section and contains clustered acidic residues and glycines. 

Consensus pattemfPWN^^ 

NOL/CmHAP^ 

1 5 x-D-H- fGACHtfG ACH S EP ID NO: 704 V j [The two H's are zinc ligands] 
Consensus pattemftWM]ILIW 
x(2)-[GM]4QST-A-}[GSTA SEP ID NO: 19V1-X-E 

[ 1] Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 
20 [2] Marsh J.J., Lebherz H.G. Trends Biochem. Sci. 17:1 10-1 13(1992). 

[ 3] von der Osten C.H., Barbas C.F. Ill, Wong C.-H., Sinskey A.J. Mol. Microbiol. 3:1625- 
1637(1989). 

[ 4] Berry A., Marshall K.E. FEBS Lett. 318:11-16(1993). 

2 5 788. Prolyl oligopeptidase family serine active site 

The prolyl oligopeptidase family [1,2,3] consist of a number of evolutionary related 
peptidases whose catalytic activity seems to be provided by a charge relay system similar to 
that of the trypsin family of serine proteases, but which evolved by independent convergent 
evolution. The known members of this family are listed below. 

30 - Prolyl endopeptidase (EC 3.4.21.26) (PE) (also called post-proline cleaving enzyme). PE is 
an enzyme that cleaves peptide bonds on the C-terminal side of prolyl residues. The sequence 
of PE has been obtained from a mammalian species (pig) and from bacteria (Flavobacterium 
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meningosepticum and Aeromonas hydrophila); there is a high degree of sequence 
conservation between these sequences. 

- Escherichia coli protease II (EC 3.4.21.83) (oligopeptidase B) (gene prtB) which cleaves 
peptide bonds on the C-terminal side of lysyl and argininyl residues. 

- Dipeptidyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an enzyme that removes N- 
terminal dipeptides sequentially from polypeptides having unsubstituted N-termini provided 
that the penultimate residue is proline. 

- Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: STE13) which is responsible 
for the proteolytic maturation of the alpha-factor precursor. 

- Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 

- Acylamino-acid-releasing enzyme (EC 3.4.19.1) (acyl-peptide hydrolase). This enzyme 
catalyzes the hydrolysis of the amino-terminal peptide bond of an N-acetylated protein to 
generate a N-acetylated amino acid and a protein with a free amino-terminus. 

A conserved serine residue has experimentally been shown (in E.coli protease II as well as in 
pig and bacterial PE) to be necessary for the catalytic mechanism. This serine, which is part 
of the catalytic triad (Ser, His, Asp), is generally located about 150 residues away from the C- 
terminal extremity of these enzymes (which are all proteins that contains about 700 to 800 
amino acids). 

Consensus patternD-x(3)-A-x(3)-[LIVMFY W 

G-G-ff7P/MF¥Wj-[ LIVMF Y^ W SEP ID NO:26 )1(2) [S is the active site residue] 

Note these proteins belong to families S9A/S9B/S9C in the classification of peptidases 
[4,E1]. 

[ 1] Rawlings N.D., Polgar L. ? Barrett A J. Biochem. J. 279:907-91 1(1991). 
[ 2] Barrett A.J., Rawlings N.D. Biol. Chem. Hoppe-Seyler 373:353-360(1992). 
[ 3] Polgar L., Szabo E. 

Biol. Chem. Hoppe-Seyler 373:361-366(1992). 

[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

789. Formate-tetrahydrofolate ligase signatures 
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Formate-tetrahydrofolate ligase (EC 6.3.4.3) (formyltetrahydrofolate synthetase) 
(FTHFS) is one of the enzymes participating in the transfer of one-carbon units, an essential 
element of various biosynthetic pathways. In many of these processes the transfers of one- 
carbon units are mediated by the coenzyme tetrahydrofolate (THF). Various reactions 
5 generate one-carbon derivatives of THF which can be interconverted between different 
oxidation states by FTHFS, methylenetetrahydrofolate dehydrogenase (EC 1.5.1.5) and 
methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9). 

In eukaryotes the FTHFS activity is expressed by a multifunctional enzyme, C-l- 
tetrahydrofolate synthase (CI -THF synthase), which also catalyzes the dehydrogenase and 

1 0 cyclohydrolase activities. Two forms of CI -THF synthases are known [1], one is located in 
the mitochondrial matrix, while the second one is cytoplasmic. In both forms the FTHFS 
domain consist of about 600 amino acid residues and is located in the C-terminal section of 
CI -THF synthase. In prokaryotes FTHFS activity is expressed by a monofunctional 
homotetrameric enzyme of about 560 amino acid residues [2]. 

15 The sequence of FTHFS is highly conserved in all forms of the enzyme. As signature 

patterns, two regions that are almost perfectly conserved were selected. The first one is a 
glycine-rich segment located in the N-terminal part of FTHFS and which could be part of an 
ATP-binding domain [2]. The second pattern is located in the central section of FTHFS. 

2 0 Consensus pattemG-E«V-M|{UVM.S 

Consensus pattern V-A-T- [IV] -R-A-L-K-x-[HN]-G-G 

[ 1] Shannon K.W., Rabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988). 

[ 2] Lovell C.R., Przybyla A., Ljungdahl L.G. Biochemistry 29:5687-5694(1990). 

25 

790. Transthyretin signatures 

Transthyretin (prealbumin) [1] is a thyroid hormone-binding protein that seems to 
transport thyroxine (T4) from the bloodstream to the brain. It is a protein of about 130 amino 
acids that assembles as a homotetramer and forms an internal channel that binds thyroxine. 
30 Transthyretin is mainly synthesized in the brain choroid plexus. In humans, variants of the 
protein are associated with distinct forms of amyloidosis. 

The sequence of transthyretin is highly conserved in vertebrates. A number of 
uncharacterized proteins also belong to this family: 
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- Escherichia coli hypothetical protein yedX. 

- Bacillus subtilis hypothetical protein yunM. 

- Caenorhabditis elegans hypothetical protein R09H10.3. 

- Caenorhabditis elegans hypothetical protein ZK697.8. 

5 

Two regions were selected as signature patterns. The first located in the N-terminal extremity 
starts with a lysine known to be involved in binding T4. The second pattern is located in the 
C-terminal extremity. 

1 0 Consensus pattern[KH]-[IV]-L-[DN]-x(3)-G-x-P-A-x(2)-[IV]-x-[IV] [The K binds thyroxine] 
Consensus pattern Y-[TH]-[IV]-[AP]-x(2)-L-S-[PQ]-[FYW]-[GS]-[FY]-[QS] 

[ 1] Schreiber G., Richardson S.J. Comp. Biochem. Physiol. 116B:137-160(1997). 

15 79 1 . Dihydropteroate synthase signatures 

All organisms require reduced folate cofactors for the synthesis of a variety of 
metabolites. Most microorganisms must synthesize folate de novo because they lack the 
active transport system of higher vertebrate cells which allows these organisms to use dietary 
folates. Enzymes that are involved in the biosynthesis of folates are therefore the target of a 

2 0 variety of antimicrobial agents such as trimethoprim or sulfonamides. 

Dihydropteroate synthase (EC 2.5.1.15) (DHPS) catalyzes the condensation of 6- 
hydroxymethyl-7,8-dihydropteridine pyrophosphate to para-aminobenzoic acid to form 7,8- 
dihydropteroate. This is the second step in the three steps pathway leading from 6- 
hydroxymethyl-7,8-dihydropterin to 7,8-dihydrofolate. DHPS is the target of sulfonamides 

2 5 which are substrates analog that compete with para-aminobenzoic acid. 

Bacterial DHPS (gene sul or folP) [1] is a protein of about 275 to 315 amino acid 
residues which is either chromosomally encoded or found on various antibiotic resistance 
plasmids. In the lower eukaryote Pneumocystis carinii, DHPS is the C-terminal domain of a 
multifunctional folate synthesis enzyme (gene fas) [2]. 

3 o Two signature patterns for DHPS were developed, the first signature is located in the 

N-terminal section of these enzymes, while the second signature is located in the central 
section. 
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Consensus pattem£WMj{LiyM SEP ID NO:4)]-x-[AG]-[felVMF}ri.IVNfF SEP TP 
N02il(2)-N-x-T-x-D-S-F-x-D-x-[SG] 

Consensus pattern[GE]-[SA]-x-fWjV^jLiVM SEQ ID NO:4>)(2)-D-felVM| [LiVM SEP ID 
NO:4j]-G-[GP]-x(2)-[STA]-x-P 

5 

[ 1] Slock J., Stahly D.P., Han C.-Y., Six E.W., Crawford LP. J. Bacteriol. 172:721 1- 
7226(1990). 

[ 2] Volpes F., Dyer M., Scaife J.G., Darby G., Stammers D.K. 5 Delves CJ. Gene 1 12:213- 
218(1992). 

10 

792. Phosphatidylinositol 3- and 4-kinases signatures 

Phosphatidylinositol 3-kinase (PI3-kinase) (EC 2.7.1.137) [1] is an enzyme that 
phosphorylates phosphoinositides on the 3 -hydroxy 1 group of the inositol ring. The exact 
function of the three products of PI3-kinase - PI-3-P, PI-3,4-P(2) and PI-3,4 5 5-P(3) - is not 
1 5 yet known, although it is proposed that they function as second messengers in cell signalling. 
Currently, three forms of PI3-kinase are known: 

- The mammalian enzyme which is a heterodimer of a 1 10 Kd catalytic chain (pi 10) and an 
85 Kd subunit (p85) which allows it to bind to activated tyrosine protein kinases. There are at 
least two different types of pi 00 subunits (alpha and beta). 

2 0 - Yeast TGR1/DRR1 and TGR2/DRR2 [2], PI3 -kinases required for cell cycle activation. 
Both are proteins of about 280 Kd. 

- Yeast VPS34 [3], a PI3-kinase involved in vacuolar sorting and segregation. VPS34 is a 
protein of about 100 Kd. 

- Arabidopsis thaliana and soybean VPS34 homologs. 

25 

Phosphatidylinositol 4-kinase (PI4-kinase) (EC 2.7.1.67) [4] is an enzyme that acts on 
phosphatidylinositol (PI) in the first committed step in the production of the second 
messenger inositol- 1,4,5,-trisphosphate. Currently the following forms of PI4-kinases are 
known: 

30 - Human PI4-kinase alpha. 

- Yeast PIK1, a nuclear protein of 120 Kd. 

- Yeast STT4, a protein of 214 Kd. 
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The PI3- and PI4-kinases share a well conserved domain at their C-terminal section; this 
domain seems to be distantly related to the catalytic domain of protein kinases [2]. Two 
signature patterns were developed from the best conserved parts of this domain. 

5 Four additional proteins belong to this family: 

- Mammalian FKBP-rapamycin associated protein (FRAP) [5], which acts as the target for 
the cell-cycle arrest and immunosuppressive effects of the FKBP12-rapamycin complex. 

- Yeast protein ESR1 [6] which is required for cell growth, DNA repair and meiotic 
recombination. 

10 - Yeast protein TEL1 which is involved in controlling telomere length. 

- Yeast hypothetical protein YHR099w, a distantly related member of this family. 

- Fission yeast hypothetical protein SpAC22E12.16C. 

Consensus pattemj4^M:FA^j{LJV MFAC S EP ID N O:95Yj-K-x(l,3)-rDEA14DEl- 
15 f^WMG]f Lf V M'C SEP T P N O: l42yi-R-Q-FDE1-x(4)-Q 
Consensus pattem[GS]-x-[AV]-x(3MfcW^ 

tfe p^M}[UVM.MQJD 

[ 1] Hiles I.D., Otsu M, Volinia S. ? Fry M.J., Gout I., Dhand R., Panayotou G. 5 Ruiz-Larrea 
20 F. 5 Thompson A., Totty N.F., Hsuan J.J., Courtneidge S.A., Parker P.J., Waterfield M.D. Cell 
70:419-429(1992). 

[ 2] Kunz J., Henriquez R., Schneider U. ? Deuter-Reinhard M. 5 MowaN., Hall M.N. Cell 
73:585-596(1993). 

[ 3] Schu P. V., Takegawa K., Fry M.J., Stack J.H., Waterfield M.D., Emr S.D. Science 
25 260:88-91(1993). 

[ 4] Garcia-Bustos J.F., Marini F., Stevenson L, Frei C, Hall M.N. EMBO J. 13:2352- 
2361(1994). 

[ 5] Brown E.J., Albers M.W., Shin T.B., Ichikawa K. ? Keith C.T., Lane W.S., Schreiber S.L. 
Nature 369:756-758(1994). 
30 [6] Kato R., Ogawa H. Nucleic Acids Res. 22:3 104-31 12(1994). 

793. FAD-dependent glycerol-3 -phosphate dehydrogenase signatures 
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FAD-dependent glycerol-3 -phosphate dehydrogenase (EC 1.1.99.5) (GPD) catalyzes 
the conversion of glycerol-3 -phosphate into dihydroxyacetone phosphate. In bacteria [1] it is 
associated with the utilization of glycerol coupled to respiration. In Escherichia coli, two 
isozymes are known: one expressed under anaerobic conditions (gene glpA) and one in 
aerobic conditions (gene glpD). In eukaryotes, a mitochondrial form of GPD participates in 
the glycerol phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 
1.1.1.8) [2,3]. 

These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD- 
binding domain in their N-terminal extremity. The mammalian enzyme differs from the 
bacterial or yeast proteins by having an EF-hand calcium-binding region (See 
<PDOC00018>) in its C-terminal extremity. 

Two signature patterns were developed. One based on the first half of the FAD- 
binding domain and one which corresponds to a conserved region in the central part of these 
enzymes. 

Consensus pattem[IV]-G-G-G-x(2)-G-(^ 
x(3)-R-G 

Consensus P anernG-G>K-x(2)-{GSm[GSTE SEP ID NO:705)] -Y-R-x(2)-A 

[ 1] Austin D., Larson T.J. J. Bacteriol. 173:101-107(1991). 

[ 2] Roennow B. ? Kieliand-Brandt M.C. Yeast 9:1 121-1 130(1993). 

[ 3] Brown L.J., McDonald M.J., Lehn D.A., Moran S.M. J. Biol. Chem. 269:14363- 

14366(1994). 

794. NOLl/NOP2/sun family signature 

The following proteins seems to be evolutionary related: 

- Mammalian proliferating-cell nucleolar antigen pi 20 (gene NOLI) which may play a role 
in the regulation of the cell cycle and the increased nucleolar activity that is associated with 
the cell proliferation. 

- Yeast nucleolar protein NOP2 (or YNA1) which could be involved in nucleolar function 
during the onset of growth, and in the maintenance of nucleolar structure. 

- Yeast hypothetical protein YBL024w. 

- Bacterial protein sun (also known as fmu). 
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- Escherichia coli hypothetical protein yebU. 

- Mycobacterium tuberculosis hypothetical protein MtCY21B4.24. 

- Methanococcus jannaschii hypothetical protein MJ0026. 

5 NOLI is a protein of 855 residues, NOP2 consists of 618 residues, YBL024w of 684, sun is a 
protein of about 430 to 450 residues and MJ026 has 274 residues. They share a conserved 
central domain which contains some highly conserved regions. One of these regions was 
selected as a signature pattern. 

1 0 Consensus p^rn[FV]^D-[KRA]-f^4VMAl{LJVMA SEP ID NO:30V} -L-x-D-rAV1-P-C- 
[ST]-[GA] 

795. moaA / nifB / pqqE family signature 

A number of proteins involved in the biosynthesis of metallo cofactors have been 
1 5 shown [1,2] to be evolutionary related. These proteins are: 

- Bacterial and archebacterial protein moaA, which is involved in the biosynthesis of the 
molybdenum cofactor (molybdopterin; MPT). 

- Arabidopsis thaliana cnx2, a protein involved in molybdopterin biosynthesis and which is 
highlys similar to moaA. 

2 0 - Bacillus subtilis narA, which seems to be the moaA ortholog in that bacteria. 

- Bacterial protein nifB (or fixZ) which is involved in the biosynthesis of the nitrogenase 
iron-molybdenum cofactor. 

- Bacterial protein pqqE which is involved in the biosynthesis of the cofactor pyrrolo- 
quinoline-quinone (PQQ). 

2 5 - Pyrococcus furiosus cmo, a protein involved in the synthesis of a molybdopterin-based 
tungsten cofactor. 

- Caenorhabditis elegans hypothetical protein F49E2.1. 

All these proteins share, in their N-terminal region, a conserved domain that contains three 
30 cysteines. In moaA, these cysteines have been shown [1] to be important for the biological 
activity. They could be inolved in the binding of an iron-sulfur cluster. 



Reference No. 2750-942P 



674 

Consensus pattem[LIV]-x(3)-C-[^ 
C [The three C's are putative Fe-S ligands 



[ 1] Menendez C, Igloi G., Henninger H., Brandsch R. Arch. Microbiol. 164:142-151(1995). 
5 [ 2] Hoff T., Schnorr K.M., Meyer C, Caboche M. J. Biol. Chem. 270:6100-6107(1995). 



796. Forkhead-associated (FHA) domain profile 

The forkhead-associated (FHA) domain [1,E1] is a putative nuclear signalling domain 
found in a variety of otherwise unrelated proteins. The FHA domain comprise approximately 
10 55 to 75 amino acids and contains three highly conserved blocks separated by divergent 
spacer regions. Currently it has been found in the following proteins: 

- Four transcription factors that also contain a forkhead (FH) domain: mouse myocyte 
nuclear factor 1 (MNF1), yeast transcription factor FHL1, which probably controls pre- 
mRNA processing, and yeast FKH1 and FKH2. In those protein the FHA domain is located 

1 5 N-terminal of the DN A-binding FH domain. 

- Kinase-associated protein phosphatase (KAPP) from Arabidopsis thaliana, a protein which 
specifically interacts with the receptor-type Ser/Thr-kinase RLK5. In KAPP, the FHA 
domain maps to a region that interacts with the receptor-type protein kinase RLK5 only if the 
kinase is phosphorylated on serine residues [2]. 

2 0 - Two protein kinases from yeast that are involved in mediating the nuclear response to DN A 

damage: DUN1 and SPK1/SAD1 [3]. The latter is the only known protein containing two 
copies of the FHA domain. 

- Protein kinase cdsl from fission yeast contains a FHA domain and might be the ortholog of 
SPK1. 

25 - Protein kinase MEK1 from yeast, which is involved in meiotic recombination. 

- Human nuclear antigen Ki67 which is expressed only in proliferating cells. 

- Yeast hypothetical protein YHR1 15c 5 which contains a RING-finger C-terminal of the 
FHA domain. 

- Yeast hypothetical proteins L8083.1 and 9346.10, which contain an extensive coiled-coil 

3 0 region C-terminal of the FHA domain. 

- Caenorhabditis elegans hypothetical protein ZK632.2. 

- Caenorhabditis elegans hypothetical protein C01G6.5. 
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- FraH from the prokaryote Anabaena, which contains a zinc-finger motif N-terminal of the 
FHA domain. 

- An ORF from the bacterium Streptomyces, which is on the opposite strand of the protein 
kinase pksl, overlapping the ORF of the kinase. 

5 

[ 1] Hofmann K.O., Bucher P. Trends Biochem. Sci. 20:347-349(1995). 

[ 2] Stone J.M., Collinge M.A., Smith R.D., Horn M.A., Walker J.C. Science 266:793- 

795(1994). 

[ 3] Navas T.A., Zhou Z., Elledge S J. Cell 80:29-39(1995). 

10 

797. Ald_Xan_dh_C 

Aldehyde oxidase and xanthine dehydrogenase, C terminus 

[1] Romao MJ 5 Archer M 5 Moura I, Moura J J, LeGall J, Engh R, Schneider M, Hof P, Huber 
15 R; Medline: 96072968 "Crystal structure of the xanthine oxidase-related aldehyde oxido- 
reductase from D. gigas." Science 1995;270:1170-1176. 

Number of members: 54 

2 0 798 . Glyco _hydro_3 8 

Glycosyl hydrolases family 38 

Glycosyl hydrolases are key enzymes of carbohydrate metabolism. 

Number of members: 20 

25 

[1] Henrissat B; Medline: 98313424; "Glycosidase families" Biochem Soc Trans 
1998;26:153-156. 

799. HECT 

3 0 HECT-domain (ubiquitin-transferase). 

The name HECT comes from Homologous to the E6-AP Carboxyl 
Terminus. 
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Number of members: 43 

[1] Huibregtse JM, Scheffner M, Beaudenon S, Howley PM; Medline: 95223981; "A family 
of proteins structurally and functionally related to the E6-AP ubiquitin-protein ligase." Proc 
5 Natl Acad Sci U S A 1995;92:2563-2567. 

800. HRDC 
HRDC domain 

The HRDC (Helicase and RNase D C-terminal) domain has a putative role in nucleic 
1 0 acid binding. Mutations in the HRDC domain cause human disease. 

Number of members: 1 9 

[1] Morozov V, Mushegian AR, Koonin EV, Bork P; Medline: 98060076; "A putative 
15 nucleic acid-binding domain in Bloom's and Werner's syndrome helicases" Trends Biochem 
Sci 1997;22:417-418. 

801. Integrase 

Integrase mediates integration of a DNA copy of the viral genome into the host 

2 0 chromosome. Integrase is composed of three domains. The amino-terminal domain is a zinc 

binding domain. The central domain is the catalytic domain [l].The carboxyl terminal 
domain is a DNA binding domain [2]. 

Number of members: 581 

25 

[1] Dyda F, Hickman AB, Jenkins TM, Engelman A, Craigie R 5 Davies DR; Medline: 
95099322. "Crystal structure of the catalytic domain of HIV- 1 integrase: similarity to other 
polynucleotidyl transferases." Science 1994;266:1981-1986. 

[2] Lodi PJ, Ernst J A, Kuszewski J, Hickman AB, Engelman A, Craigie R, Clore GM, 

3 0 Gronenborn AM; Medline: 95359147; "Solution structure of the DNA binding domain of 

HIV-1 integrase." Biochemistry 1995;34:9826-9833 



802. lig_chan 
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Ligand-gated ion channel 

This family includes the four transmembrane regions of the ionotropic glutamate 
receptors and NMDA receptors. 

5 Number of members: 128 

[1] Tong G, Shepherd D 5 Jahr CE; Medline: 95184014; "Synaptic desensitization of NMDA 
receptors by calcineurin." Science 1995;267:1510-1512. 

10 803. RhoGAP 
RhoGAP domain 

GTPase activator proteins towards Rho/Rac/Cdc42-like small GTPases. 
Number of members: 97 

15 

[1] Musacchio A, Cantley LC ? Harrison SC; Medline: 97121392; "Crystal structure of the 
breakpoint cluster region-homology domain from phosphoinositide 3-kinase p85 alpha 
subunit." Proc Natl Acad Sci U S A 1996;93:14373-14378. 

[2] Barrett T ? Xiao B, Dodson EJ 5 Dodson G, Ludbrook SB, Nurmahomed K, Gamblin SJ, 
2 0 Musacchio A, Smerdon SJ 5 Eccleston JF; Medline: 97162209; "The structure of the GTPase- 
activating domain from p50rhoGAP." Nature 1997;385:458-461. 

[3] Rittinger K ? Walker PA, Eccleston JF, Nurmahomed K, Owen D, Laue E, Gamblin SJ, 
Smerdon SJ; Medline: 97404320; "Crystal structure of a small G protein in complex with the 
GTPase-activating protein rhoGAP." Nature 1997;388:693-697. 
2 5 [4] Boguski MS, McCormick F; Medline: 94081948; "Proteins regulating Ras and its 
relatives." Nature 1993;366:643-654. 

804. vwd 

von Willebrand factor type D domain 

30 

[1] Bork P; Medline: 93327926; "The modular architecture of a new family of growth 
regulators related to connective tissue growth factor." FEBS lett 1993;327:125-130. 
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Number of members: 92 

805 . zf-C4_Topoisom 
Topoisomerase DNA binding C4 zinc finger 

5 

[1] Tse-Dinh YC, Beran-Steed RK; Medline: 89034032; "Escherichia coli DNA 
topoisomerase I is a zinc 

metalloprotein with three repetitive zinc -binding domains." J Biol Chem 1988;263: 15857- 
15859. 

1 0 [2] Ahumada A, Tse-Dinh YC; Medline: 9901 1409; "The Zn(II) binding motifs of E. coli 

DNA topoisomerase I is part of a high-affinity DNA binding domain." Biochem Biophys Res 
Commun 1998;251:509-514. 

Number of members: 5 1 

15 

806. AIRC 
AIR carboxylase 

Members of this family catalyse the decarboxylation of l-(5-phosphoribosyl)-5-amino-4- 
imidazole-carboxylate (AIR). This family catalyse the sixth step of de novo purine 
2 0 biosynthesis. Some members of this family contain two copies of this 
domain. Number of members: 35 

807. Bromodomain signature and profile 

PROSITE cross-reference(s): PS00633; BROMODOMAIN_l, PS50014; 

2 5 BROMODOMAIN 2 

The bromodomain [1,2,3] is a conserved region of about 70 amino acids found in the 
following proteins: 

- Higher eukaryotes transcription initiation factor TFIID 250 Kd subunit (TBP-associated 

3 0 factor p250) (gene CCG1). P250 associated with the TFIID TATA-box binding protein and 

seems essential for progression of the Gl phase of the cell cycle. 

- Human RING3, a protein of unknown function encoded in the MHC class II locus. 



Reference No. 2750-942P 



679 

- Mammalian CREB-binding protein (CBP), which mediates cAMP-gene regulation by 
binding specifically to phosphorylated CREB protein. 

- Drosophila female sterile homeotic protein (gene fsh), required maternally for proper 
expression of other homeotic genes involved in pattern formation, such as Ubx. 

5 - Drosophila brahma protein (gene brcn), a protein required for the activation of multiple 
homeotic genes. 

- Mammalian homologs of brahma. In human, three brahma-like proteins are known: 
SNF2a(hBRM) ? SNF2b, and BRG1. 

- Human BS69, a protein that binds to adenovirus El A and inhibits El A transactivation 
10 - Human peregrin (or Br 140). 

- Yeast BDF1 [3], a transcription factor involved in the expression of a broad class of genes 
including snRNAs. 

- Yeast GCN5 5 a general transcriptional activator operating in concert with certain other 
DNA-binding transcriptional activators, such as GCN4, HAP2/3/4 or ADA2. 

1 5 - Yeast NPS1/STH1 , involved in G(2) phase control in mitosis. 

- Yeast SNF2/SWI2, which is part of a complex with the SNF5, SNF6, SWI3 and 
ADR6/SWI1 proteins. This SWI-complex is involved in transcriptional activation. 

- Yeast SPT7, a transcriptional activator of Ty elements and possibly other genes. 

- Caenorhabditis elegans protein cbp-1. 
2 0 - Yeast hypothetical protein YGR056w. 

- Yeast hypothetical protein YKR008w. 

- Yeast hypothetical protein L9638.1. 

Some proteins contain a region which, while similar to some extent to a classical 

2 5 bromodomain, diverges from it by either lacking part of the domain or because of an 

insertion. These proteins are: 

- Mammalian protein HRX (also known as All-1 or MIX), a protein involved in 
translocations leading to acute leukemias and which possibly acts as a transcriptional 

3 0 regulatory factor. HRX contains a region similar to the C- terminal half of the bromodomain. 

- Caenorhabditis elegans hypothetical protein ZK783.4. The bromodomain of this protein has 
a 23 amino-acid insertion. 
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- Yeast protein YTA7. This protein contains a region with significant similarity to the C- 
terminal half of the bromodomain. As it is a member of the AAA family (see 
<PDOC00572>) it is also in a functionally different context. 

5 The above proteins generally contain a single bromodomain, but some of them contain two 
copies, this is the case of BDF1, CCG1, fsh, RING3, YKR008w and L9638.1. 

The exact function of this domain is not yet known but it is thought to be involved in protein- 
protein interactions and it may be important for the assembly or activity of multicomponent 
1 0 complexes involved in transcriptional activation. 

The consensus pattern that has been developed spans a major part of the bromodomain; a 
more sensitive detection is available through the use of a profile which spans the whole 
domain. 

15 

Consensus pattemfSTANVP][SXA^ 

[DE N OTF] [DENOTF SEP ID NO:707Yj-Y-[HFY]-x(2V 

II.IVMFYl fLIVMFY SEP ID NO:18)| -x(3)-|LIV M 1 fLlVM SEP ID NO:4Tl -xf4V 
(EIVM^LIVM SEP ID NO:4)]-x(6,8VY-x(12J3VR,WMI[LIVM SEP ID NO:4)j- 
20 x(2)<N+SA€F}[SACF.SEQJDN 

References 

[1] Haynes S.R., Doolard C, Winston F., Beck S. 5 Trowsdale J., Dawid LB. Nucleic Acids 
Res. 20:2693-2603(1992). 
2 5 [2] Tamkun J.W., Deuring R., Scott M.P., Kissinger M., Pattatucci A.M., Kaufman T.C., 
Kennison J.A. Cell 68:561-572(1992). 

[ 3] Tamkun J.W. Curr. Opin. Genet. Dev. 5:473-477(1995). 

808. (CH) Actinin-type actin-binding domain signatures 
30 PRPSITE cross-reference(s): PS00019; ACTININ^l, PS00020; ACTININ_2 

Alpha-actinin is a F-actin cross-linking protein which is thought to anchoractin to a variety of 
intracellular structures [1]. The actin-binding domain of alpha-actinin seems to reside in the 
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first 250 residues of the protein. A similar actin-binding domain has been found in the N- 
terminal region of many different actin-binding proteins [2 5 3]: 

- In the beta chain of spectrin (or fodrin). 

5 - In dystrophin, the protein defective in Duchenne muscular dystrophy (DMD) and which 
may play a role in anchoring the cytoskeleton to the plasma membrane. 

- In the slime mold gelation factor (or ABP-120). 

- In actin-binding protein ABP-280 (or filamin), a protein that link actin filaments to 
membrane glycoproteins. 

10 - In fimbrin (or plastin), an actin-bundling protein. Fimbrin differs from the above proteins in 
that it contains two tandem copies of the actin-binding domain and that these copies are 
located in the C-terminal part of the protein. 

Two conserved regions were selected as signature patterns for this type of main. The first of 
1 5 this region is located at the beginning of the domain, hile the second one is located in the 
central section and has been shown to be essential for the binding of actin. 

Consensus pattern[EQ]-x(2)-[ATV]-[FY]-x(2)-W-x-N 
Consensus pattemffc^ILlVMMMQ: 
20 E©AGHE^ 

fyP^£U VM. SEP ID NO:4)j-x- 

£S£AG} [DEAG SEP ID NO:71 lV l-x(4VH:rl^HLlVM SEP ID NO:4)]-x-rLMHSAGl- 
$WVM# UVM SEP TP NO:4 yi4MV^m [lJ VMT SEP IDNO: HI- W-x- [HVM-tiLiVM 
SEQ.ID.NP:4)J(2) 

25 

[ 1] Schleicher M, Andre E., Harmann A., Noegel A.A. Dev. Genet. 9:521-530(1988). 
[ 2] Matsudaira P. Trends Biochem. Sci. 16:87-92(1991). 
[ 3] Dubreuil R.R. BioEssays 13:219-226(1991). 

30 809. (CPX1) Heme-copper oxidase subunit I 5 copper B binding region signature 
PRCSITE cross-reference(s): PS00077; CGX1 

Heme-copper respiratory oxidases [1] are oligomeric integral membrane protein 
complexes that catalyze the terminal step in the respiratory chain: they 



Reference No. 2750-942P 



682 

transfer electrons from cytochrome c or a quinol to oxygen. Some terminal 
oxidases generate a transmembrane proton gradient across the plasma membrane 
(prokaryotes) or the mitochondrial inner membrane (eukaryotes). The enzyme 
complex consists of 3-4 subunits (prokaryotes) up to 13 polypeptides (mammals) 
5 of which only the catalytic subunit (equivalent to mammalian subunit 1 (CO I)) 
is found in all heme-copper respiratory oxidases. The presence of a bimetallic 
center (formed by a high-spin heme and copper B) as well as a low-spin heme, 
both ligated to six conserved histidine residues near the outer side of four 
transmembrane spans within CO I is common to all family members [2-4]. 

10 

In contrary to eukaryotes the respiratory chain of prokaryotes is branched to 
multiple terminal oxidases. The enzyme complexes vary in heme and copper 
composition, substrate type and substrate affinity. The different respiratory 
oxidases allow the cells to customize their respiratory systems according a 
1 5 variety of environmental growth conditions [1], 

Recently also a component of an anaerobic respiratory chain has been found to 
contain the copper B binding signature of this family: nitric oxide reductase 
(NOR) exists in denitrifying species of Archae and Eubacteria. 

20 

Enzymes that belong to this family are: 

- Mitochondrial-type cytochrome c oxidase (EC 1.9.3.1) which uses cytochrome 
c as electron donor. The electrons are transferred via copper A (Cu(A)) and 

2 5 heme a to the bimetallic center of CO I that is formed by a penta- 

coordinated heme a and copper B (Cu(B)). Subunit 1 contains 12 
transmembrane regions. Cu(B) is said to be ligated to three of the 
conserved histidine residues within the transmembrane segments 6 and 7. 

- Quinol oxidase from prokaryotes that transfers electrons from a quinol to 

3 0 the binuclear center of polypeptide I. This category of enzymes includes 

Escherichia coli cytochrome O terminal oxidase complex which is a component 
of the aerobic respiratory chain that predominates when cells are grown at 
high aeration. 
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- FixN, the catalytic subunit of a cytochrome c oxidase expressed in 
nitrogen-fixing bacteroids living in root nodules. The high affinity for 
oxygen allows oxidative phosphorylation under low oxygen concentrations. A 
similar enzyme has been found in other purple bacteria. 

- Nitric oxide reductase (EC 1.7.99.7) from Pseudomonas stutzeri. NOR reduces 
nitrate to dinitrogen. It is a heterodimer of norC and the catalytic 

subunit norB. The latter contains the 6 invariant histidine residues and 12 
transmembrane segments [5]. 

As a signature pattern the copper-binding region was used. 

Consensus pattem[YWG]-[LtV?¥W^ 
[LNP]-x-V-x(44,47)-H-H [The 
three H's are copper B ligands] 

Notecytochrome bd complexes do not belong to this family. 
[1] 

Garcia-Horsman J.A., Barquera B., Rumbley J., Ma J., Gennis R.B. 

J. Bacteriol. 176:5587-5600(1994). 

[2] 

Castresana J., Luebben M., Saraste M. ? Higgins D.G. 

EMBO J. 13:2516-2525(1994). 

[3] 

Capaldi R.A., Malatesta F. 5 Darley-Usmar V.M. 
Biochim. Biophys. Acta 726:135-148(1983). 
[4] 

Holm L., Saraste M, Wikstrom M. 
EMBO J. 6:2819-2823(1987). 
[5] 

Saraste M., Castresana J. 
FEBS Lett. 341:1-4(1994). 
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810. (dehydrogmolyb) Eukaryotic molybdopterin oxidoreductases signature 
PROSITE cross-reference(s): PS00559; MOLYBDOPTERIN_EUK 

A number of different eukaryotic oxidoreductases that require and bind a 
5 molybdopterin cofactor have been shown [1] to share a few regions of sequence 
similarity. These enzymes are: 

- Xanthine dehydrogenase (EC 1.1.1.204), which catalyzes the oxidation of 
xanthine to uric acid with the concomitant reduction of NAD. Structurally, 

1 0 this enzyme of about 1 300 amino acids consists of at least three distinct 

domains: an N-terminal 2Fe-2S ferredoxin-like iron-sulfur binding domain 

(see <PDOC00175>), a central FAD/NAD-binding domain and a C-terminal Mo- 

pterin domain. 

- Aldehyde oxidase (EC 1.2.3.1), which catalyzes the oxidation aldehydes into 
1 5 acids. Aldehyde oxidase is highly similar to xanthine dehydrogenase in its 

sequence and domain structure. 

- Nitrate reductase (EC 1 .6.6.1), which catalyzes the reduction of nitrate 
to nitrite. Structurally, this enzyme of about 900 amino acids consists of 

an N-terminal Mo-pterin domain, a central cytochrome b5-type heme-binding 
2 0 domain (see <PDOC00 1 70>) and a C-terminal FAD/NAD-binding cytochrome 
reductase domain. 

- Sulfite oxidase (EC 1.8.3.1), which catalyzes the oxidation of sulfite to 
sulfate. Structurally, this enzyme of about 460 amino acids consists of an 
N-terminal cytochrome b5 -binding domain followed by a Mo-pterin domain. 

25 

There are a few conserved regions in the sequence of the molybdopterin-binding 
domain of these enzymes. The pattern uses to detect these proteins is based 
on one of them. It contains a cysteine residue which could be involved in 
binding the molybdopterin cofactor. 

30 

Consensus P attem[GA]-x(3V{^^QHTjrK:RNOH f r SEP ID NO:396)]-x(l 1,14)- 
x(2)-[DEN]-R-x(2)-[DE] 
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[1] 

Wootton J.C., Nicolson R.E., Cock J.M., Walters D.E., Burke J.F., Doyle 
W.A., Bray R.C. 
5 Biochim. Biophys. Acta 1057:157-185(1991). 

811. (DNA_ligase) ATP-dependent DNA ligase signatures 

PROSITE cross-reference(s): PS00697; DN ALIG ASEA 1 , PS00333; DNA_LIGASE_A2 

1 0 DNA ligase (polydeoxyribonucleotide synthase) is the enzyme that joins two DNA 
fragments by catalyzing the formation of an internucleotide ester bond between 
phosphate and deoxyribose. It is active during DNA replication, DNA repair and 
DNA recombination. There are two forms of DNA ligase: one requires ATP 
(EC 6.5.1.1), the other NAD (EC 6.5.1.2). 

15 

Eukaryotic, archaebacterial, virus and phage DNA ligases are ATP-dependent. 
During the first step of the joining reaction, the ligase interacts with ATP 
to form a covalent enzyme-adenylate intermediate. A conserved lysine residue 
is the site of adenylation [1,2]. 

20 

Apart from the active site region, the only conserved region common to all 
ATP-dependent DNA ligases is found [3] in the C-terminal section and contains 
a conserved glutamate as well as four positions with conserved basic residues. 

2 5 Signature patterns were developed for both conserved regions. 

Consensus pattemf£&OMj pSDPH SEP ID NP:713)1 -x-K-x-rDN1-G-x-R- 

fQA^aVkff fGACIVM SEC ID N O :714)l [K is the active site 
residue] 

30 

Consensus patternE-G4feI¥MA}[LIVMA SEP ID NO : 3 0)14 I^P^M|[L j VM_SIiQiP 
NO!4j](2)-[KR]<S T 8)-[YW]-tON^>[OHEK SEP ID NQ:71S)1-x(2,6)- 
[KRH]-x(^.SVK- p..iVMF\ r { rLIVMFY SEP IDNP:1 8)1-K 
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Sequences known to belong to this class detected by the patternALL, except 
for archebacterial DNA ligases. 

[i] 

5 Tomkinson A.E., Totty N.F., Ginsburg M., Lindahl T. 
Proc. Natl. Acad. Sci. U.S.A. 88:400-404(1991). 
[2] 

Lindahl T. 5 Barnes D.E. 
Annu. Rev. Biochem. 61:251-281(1992). 
10 [3] 

Kletzin A. 

Nucleic Acids Res. 20:5389-5396(1992). 

812. (FAD_Gly3P_dh) FAD-dependent glycerol-3-phosphate dehydrogenase signatures 
1 5 PROSITE cross-reference(s): PS00977; FAD_G3PDH_1 , PS00978; FAD G3PDH 2 

FAD-dependent glycerol-3 -phosphate dehydrogenase (EC 1.1.99.5) (GPD) catalyzes 
the conversion of glycerol-3 -phosphate into dihydroxyacetone phosphate. In 
bacteria [1] it is associated with the utilization of glycerol coupled to 
2 0 respiration. In Escherichia coli, two isozymes are known: one expressed under 
anaerobic conditions (gene glpA) and one in aerobic conditions (gene glpD). In 
eukaryotes, a mitochondrial form of GPD participates in the glycerol phosphate 
shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 1.1.1.8) [2, 
3]. 

25 

These enzymes are proteins of about 60 to 70 Kd which contain a probable 
FAD-binding domain in their N-terminal extremity. The mammalian enzyme differs 
from the bacterial or yeast proteins by having an EF-hand calcium-binding 
region (See <PDOC00018>) in its C-terminal extremity. 

30 

Two signature patterns were developed. One based on the first half of the FAD- 
binding domain and one which corresponds to a conserved region in the central 
part of these enzymes. 
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Consensus pattern [TV]-G-G-G-x(2)-G-f^£A€ ^| S TAG V SEP ID NO:146)l -G-x-A-x-D- 
x(3)-R-G 

5 Consensus naiternG-G-K-x(2)-{QS^fGSTE SEP IDNP:705)1-Y-R-x(2)-A 
[1] 

Austin D., Larson T.J. 

J. Bacteriol. 173:101-107(1991). 

[2] 

1 0 Roennow B., Kielland-Brandt M.C. 
Yeast 9:1121-1130(1993). 
[3] 

Brown L.J., McDonald M.J., Lehn D.A., Moran S.M. 
J. Biol. Chem. 269:14363-14366(1994). 

15 

813. (Fapy_DNA_glyco) Formamidopyrimidine-DNA glycosylase signature 
PRGSITE cross-reference(s): PS01242; FPG 

Formamidopyrimidine-DNA glycosylase (EC 3.2.2.23) [1] (Fapy-DNA glycosylase) 
2 0 (gene fpg) is a bacterial enzyme involved in DNA repair and which excise 

oxidized purine bases to release 2,6-diamino-4-hydroxy-5N-methylformamido- 
pyrimidine (Fapy) and 7,8-dihydro-8-oxoguanine (8-GxoG) residues. In addition 
to its glycosylase activity, FPG can also nick DNA at apurinic/apyrimidinic 
sites (AP sites). FPG is a monomeric protein of about 32 Kd which binds and 
2 5 require zinc for its activity. 

The binding site for zinc seems to be located in the C-terminal part of the 

enzyme where fours conserved and essential [2] cysteines are located. A signature pattern 

was developed based on this region. 

30 

rwgngns pa w e rnr.-x(2.4)-C-xHF^» ; frGTAO SEP ID NP:7 16)1-x-nV1-x(7>R- 
fQS¥ANffG STAN SEP ID NP:29 6)l-rSTA1-x-rFYI1-C- x(2)-C-Q 
[The four C's are putative zinc ligands] 
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[1] 

Duwat P., de Oliveira R., Ehrlich S.D., Boiteux S. 
Microbiology 141:411-417(1995). 
5 [2] 

O'Connor T.E., Graves R.J., Demurcia G., Castaing B., Laval J. 
J. Biol. Chem. 268:9063-9070(1993). 

814. (G_glu_transpept) Gamma-glutamyltranspeptidase signature 
1 0 PROSITE cross-reference(s): PS00462; G GLU TRANSPEPTIDASE 

Gamma-glutamyltranspeptidase (EC 2.3.2.2) (GGT) [1] catalyzes the transfer of 
the gamma-glutamyl moiety of glutathione to an acceptor that may be an amino 
acid, a peptide or water (forming glutamate). GGT plays a key role in the 
1 5 gamma-glutamyl cycle, a pathway for the synthesis and degradation of 

glutathione. In prokaryotes and eukaryotes, it is an enzyme that consists of 
two polypeptide chains, a heavy and a light subunit, processed from a single 
chain precursor. The active site of GGT is known to be located in the light 
subunit. 

20 

The sequences of mammalian and bacterial GGT show a number of regions of 
high similarity [2]. Pseudomonas cephalosporin acylases (EC 3.5.1.-) that 
convert 7-beta-(4-carboxybutanamido)-cephalosporanic acid (GL-7ACA) into 
7-aminocephalosporanic acid (7ACA) and glutaric acid are evolutionary related 
25 to GGT and also show some GGT activity [3]. Like GGT, these GL-7ACA acylases, 
are also composed of two subunits. 

One of the conserved regions correspond to the N-terminal extremity of the 
mature light chains of these enzymes. This region was used as a signature 
3 0 pattern. 



Consensus P attemT-[STA]-H-x-[ST]-fMVMAj[yVMA SEQ.® NO:30)]-x(4)-G-[SN]-x-V- 
[STA]-x-T-x-T- 
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fMVMjfLIVM SEP TP NO:4)1-fNEl-x(U)-rFY1-G 
[1] 

Tate S.S., Meister A. 
5 Meth. Enzymol. 1 13:400-419(1985). 
[2] 

Suzuki H., Kumagai H., Echigo T., Tochikura T. 

J. Bacterid. 171:5169-5172(1989). 

[3] 

1 0 Ishiye M., Niwa M. 

Biochim. Biophys. Acta 1 132:233-239(1992). 

815. G-protein gamma subunit profile 

PROSITE cross-reference(s): PS50058; G PROTEIN GAMMA 

15 

Guanine nucleotide-binding proteins (G proteins) [1] act as intermediaries in 
the transduction of signals generated by transmembrane receptors. G proteins 
consist of three subunits (alpha, beta, and gamma). The alpha subunit binds to 
and hydrolyzes GTP; the functions of the beta and gamma subunits are less 
2 0 clear but they seem to be required for the replacement of GDP by GTP as well 
as for membrane anchoring and receptor recognition. 

The gamma subunits are small proteins (from 70 to 1 10 residues) that are 
bound to the membrane via a isoprenyl group (either a farnesyl or a geranyl- 
2 5 geranyl) covalently linked to their C-terminus. In mammals there are at least 
12 different isoforms of gamma subunits. 

The Caenorhabditis elegans protein egl-10, which is a regulator of G-protein 
signalling, contains a G-protein gamma-like domain. 

30 

A profile was developed that spans the complete length of the gamma 
subunit. 
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[1] 

Pennington S.R. 

Protein Prof. 2:16-315(1995). 



5 816. GNS 1 /SUR4 family signature 

PROSITE cross-reference(s): PS01 188; GNS1_SUR4 



The following group of eukaryotic integral membrane proteins, whose exact 
function has not yet clearly been established, are evolutionary related [1]: 

10 

- Yeast GNS1 [2], a protein involved in synthesis of 1,3-beta-glucan. 

- Yeast SUR4 (or APA1, SRE1) [3], a protein that could act in a glucose- 
signaling pathway that controls the expression of several genes that are 
transcriptionally regulated by glucose. 

15 - Yeast hypothetical protein YJL196c. 

- Caenorhabditis elegans hypothetical protein C40H1.4. 

- Caenorhabditis elegans hypothetical protein D2024.3. 

The proteins have from 290 to 435 amino acid residues. Structurally, they seem 
20 to be formed of three sections: a N-terminal region with two transmembrane 

domains, a central hydrophilic loop and a C-terminal region that contains from 

one to three transmembrane domains. A conserved region that contains three histidines was 

selected as a signature pattern. This region is located in the 

hydrophilic loop. 

25 

Consensus patternL-x-F-L-H-x-Y-H-H 



[i] 

Bairoch A. 
3 0 Unpublished observations ( 1 996) . 
[2] 

El-Sherbeini M., Clemas J.A. 

J. Bacterid. 177:3227-3234(1995). 
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[3] 

Garcia-Arranz M., Maldonado A.M., Mazon M.J., Portillo F. 
J. Biol. Chem. 269:18076-18082(1994). 

5 817. Immunoglobulins and major histocompatibility complex proteins signature 
PROSITE cross-reference(s): PS00290; IG_MHC 

The basic structure of immunoglobulin (Ig) [1] molecules is a tetramer of two 
light chains and two heavy chains linked by disulfide bonds. There are two 
1 0 types of light chains: kappa and lambda, each composed of a constant domain 
(CL) and a variable domain (VL). There are five types of heavy chains: alpha, 
delta, epsilon, gamma and mu, all consisting of a variable domain (VH) and 
three (in alpha, delta and gamma) or four (in epsilon and mu) constant 
domains (CHI to CH4). 

15 

The major histocompatibility complex (MHC) molecules are made of two chains. 
In class I [2] the alpha chain is composed of three extracellular domains, a 
transmembrane region and a cytoplasmic tail. The beta chain (beta-2- 
microglobulin) is composed of a single extracellular domain. In class II [3], 
2 0 both the alpha and the beta chains are composed of two extracellular domains, 
a transmembrane region and a cytoplasmic tail. 

It is known [4,5] that the Ig constant chain domains and a single 
extracellular domain in each type of MHC chains are related. These 

2 5 homologous domains are approximately one hundred amino acids long and 

include a conserved intradomain disulfide bond. A small pattern 

around the C-terminal cysteine is involved in this disulfide bond which can be used to detect 
these category of Ig related proteins. 

3 0 Consensus pattern[FY]-x-C-x-[VA]-x-H-Sequences known to belong to this 

class detected by the pattern: Ig heavy chains type Alpha C region : All, 
in CH2 and CH3. Ig heavy chains type Delta C region : All, in CH3. Ig 
heavy chains type Epsilon C region: All, in CHI, CH3 and CH4. Ig heavy 
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chains type Gamma C region : All, in CH3 and also CHI in some cases Ig 
heavy chains type Mu C region : All, in CH2, CH3 and CH4. Ig light chains 
type Kappa C region : In all CL except rabbit and Xenopus. Ig light chains 
type Lambda C region : In all CL except rabbit. MHC class I alpha chains : 
5 All, in alpha-3 domains, including in the cytomegalovirus MHC-1 homologous 
protein [6]. Beta-2-microglobulin : All. MHC class II alpha chains: All, 
in alpha-2 domains. MHC class II beta chains: All, in beta-2 domains. 

[1] 

10 GoughN. 

Trends Biochem. Sci. 6:203-205(1981). 
[2] 

Klein J., Figueroa F. 
Immunol. Today 7:41-44(1986). 
15 [3] 

Figueroa F., Klein J. 
Immunol. Today 7:78-81(1986). 
[4] 

Orr H.T., Lancet D., Robb R.J., Lopez de Castro J.A., Strominger J.L. 
20 Nature 282:266-270(1979). 
[5] 

Cushley W., Owen M.J. 
Immunol. Today 4:88-92(1983). 
[6] 

25 Beck S., Barrel B.G. 

Nature 331:269-272(1988). 

818. (IGFBP) Insulin-like growth factor binding proteins signature 
PROSITE cross-reference(s): PS00222; IGF BINDING 

30 

The insulin-like growth factors (IGF-I and IGF-II) bind to specific binding 
proteins in extracellular fluids with high affinity [1,2,3]. These IGF-binding 
proteins (IGFBP) prolong the half-life of the IGFs and have been shown to 
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either inhibit or stimulate the growth promoting effects of the IGFs on cells 
culture. They seem to alter the interaction of IGFs with their cell surface 
receptors. There are at least six different IGFBPs and they are structurally 
related. 

5 

The following growth-factor inducible proteins are structurally related to 
IGFBPs and could function as growth-factor binding proteins [4,5]: 

- Mouse protein cyr61 and its probable chicken homolog, protein CEF-10. 

10 - Human connective tissue growth factor (CTGF) and its mouse homolog, protein 
FISP-12. 

- Vertebrate protein NOV. 

As a signature pattern a conserved cysteine-rich region locatedin the N-terminal 
1 5 section of these proteins is used. 

Consensus patternG-C-[GS]-C-C-x(2)-C-A-x(6)-C 

Sequences known to belong to this class detected by the patternALL, except 
for IGFBP-6's. 

20 

[1] 

Rechler M.M. 

Vitam. Horm. 47:1-114(1993). 
[2] 

2 5 Shimasaki S., Ling N. 

Prog. Growth Factor Res. 3:243-266(1991). 
[3] 

Clemmons D.R. 

Trends Endocrinol. Metab. 1:412-417(1990). 
30 [4] 

Bradham D.M., Igarashi A., Potter R.L., Grotendorst G.R. 

J. Cell Biol. 114:1285-1294(1991). 

[5] 
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Maloisel V., Martinerie C, Dambrine G. ? Plassiart G., Brisac M., Crochet 
J., Perbal B. 

Mol. Cell. Biol. 12:10-21(1992). 

5 819. LMWPc : Low molecular weight phosphotyrosine protein phosphatase 
Number of members: 34 

[l]Medline: 94329182, The crystal structure of a low-molecular-weight phosphotyrosine 
protein phosphatase. Su XD, Taddei N, Stefani M ? Ramponi G, Nordlund P; Nature 
10 1994;370:575-578. 

820. (myosin Jiead) ATP/GTP-binding site motif A (P-loop) 
PROSITE cross-reference(s): PS00017; ATP GTP A 

1 5 From sequence comparisons and crystallographic data analysis it has been shown 
[1,2,3,4,5,6] that an appreciable proportion of proteins that bind ATP or GTP 
share a number of more or less conserved sequence motifs. The best conserved 
of these motifs is a glycine-rich region, which typically forms a flexible 
loop between a beta-strand and an alpha-helix. This loop interacts with one of 

20 the phosphate groups of the nucleotide. This sequence motif is generally 
referred to as the 'A 1 consensus sequence [1] or the 'P-loop' [5]. 

There are numerous ATP- or GTP-binding proteins in which the P-loop is found. 
A number of protein families for which the relevance of the 
2 5 presence of such motif has been noted is listed below: 

- ATP synthase alpha and beta subunits (see <PDOC00137>). 

- Myosin heavy chains. 

- Kinesin heavy chains and kinesin-like proteins (see <PDOC00343>). 
30 - Dynamins and dynamin-like proteins (see <PDOC00362>). 

- Guanylate kinase (see <PDOC00670>). 

- Thymidine kinase (see <PDOC00524>). 

- Thymidylate kinase (see <PDOC01034>). 
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- Shikimate kinase (see <PDOC00868>). 

- Nitrogenase iron protein family (nifH/frxC) (see <PDOC00580>). 

- ATP-binding proteins involved in 'active transport* (ABC transporters) [7] 
(see <PDOC00185>). 

5 - DNA and RNA helicases [8,9,10]. 

- GTP-binding elongation factors (EF-Tu, EF-1 alpha, EF-G, EF-2, etc.). 

■ Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Yptl, SEC4, etc.). 

- Nuclear protein ran (see <PDOC00859>). 

- ADP-ribosylation factors family (see <PDOC00781>). 
1 0 - Bacterial dnaA protein (see <PDOC00771>). 

- Bacterial recA protein (see <PDOC00131>). 

- Bacterial recF protein (see <PDOC00539>). 

- Guanine nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, GO, etc.). 

- DNA mismatch repair proteins mutS family (See <PDOC00388>). 
15 - Bacterial type II secretion system protein E (see <PDOC00567>). 

Not all ATP- or GTP-binding proteins are picked-up by this motif A number of 
proteins escape detection because the structure of their ATP-binding site is 
completely different from that of the P-loop. Examples of such proteins are 
20 the E1-E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding 
proteins the flexible loop exists in a slightly different form; this is the 
case for tubulins or protein kinases. A special mention must be reserved for 
adenylate kinase, in which there is a single deviation from the P-loop 
pattern: in the last position Gly is found instead of Ser or Thr. 

25 

Consensus pattern[AG]-x(4)-G-K-[ST] 
[1] 

Walker J.E., Saraste M, Runswick M.J., Gay N.J. 
30 EMBO J. 1:945-951(1982). 
[2] 

Moller W., Amons R. 
FEBS Lett. 186:1-7(1985). 
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[3] 

Fry D.C., Kuby S.A., Mildvan A.S. 

Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 

[4] 

5 Dever T.E., Glynias M.J., Merrick W.C. 

Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 
[5] 

Saraste M., Sibbald P.R., Wittinghofer A. 
Trends Biochem. Sci. 15:430-434(1990). 
10 [6] 

Koonin E.V. 

J. Mol. Biol. 229:1165-1 174(1993). 
[7] 

Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher 
15 M.P. 

J. Bioenerg. Biomembr. 22:571-592(1990). 
[8] 

Hodgman T.C. 

Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
20 [9] 

Linder P., Lasko P., Ashburner M, Leroy P., Nielsen P.J., Nishi K., 
Schnier J., Slonimski P.P. 
Nature 337:121-122(1989) 
[10] 

2 5 Gorbalenya A.E., Koonin E. V., Donchenko A.P., Blinov V.M. 
Nucleic Acids Res. 17:4713-4730(1989). 

821. PE: PE family 

This family named after a PE motif near to the amino terminus of the domain. The PE family 
30 of proteins all contain an amino-terminal region of about 110 amino acids. The carboxyl 
terminus of this family are variable and fall into several classes. The largest class of PE 
proteins is the highly repetitive PGRS class which have a high glycine content. The function 
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of these proteins is uncertain but it has been suggested that they may be related to antigenic 
variation of Mycobacterium tuberculosis [1]. Number of members: 88 

[1] Medline: 98295987. Deciphering the biology of Mycobacterium tuberculosis from the 
5 complete genome sequence. Cole ST, Brosch R, Parkhill J, Gamier T, Churcher C, Harris D, 
Gordon SV, Eiglmeier K, Gas S, Barry CE 3rd, Tekaia F, Badcock K, Basham D, Brown D, 
Chillingworth T, Connor R ? Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd 
S, Hornsby T, Jagels K, Barrell BG 5 et al; Nature 1998;393:537-544. 

1 0 822. (RNB) Ribonuclease II family signature 

PROSITE cross-reference(s): PS01 175; RIBONUCLEASEJI 

On the basis of sequence similarities, the following bacterial and eukaryotic 
proteins seem to form a family: 

15 

- Escherichia coli and related bacteria ribonuclease II (EC 3.1.13.1) (RNase 
II) (gene rnb) [1]. RNase II is an exonuclease involved in mRNA decay. It 
degrades mRNA by hydrolyzing single-stranded polyribonucleotides 
processively in the 3* to 5 f direction. 

2 0 - Bacterial protein vacB. In Shigella flexneri, vacB has been shown to be 
required for the expression of virulence genes at the posttranscriptional 
level. 

- Yeast protein SSD1 (or SRK1) which is implicated in the control of the cell 
cycle Gl phase. 

2 5 - Yeast protein DIS3 [2], which binds to ran (GSP1) and ehances the the 
nucleotide-releasing activity of RCC1 on ran. 

- Fission yeast protein dis3, which is implicated in mitotic control. 

- Neurospora crassa cyt-4, a mitochondrial protein required for RNA 5' and 3' 
end processing and splicing. 

30 - Yeast protein MSU1 , which is involved in mitochondrial biogenesis. 

- Synechocystis strain PCC 6803 protein zam [3], which control resistance to 
the carbonic anhydrase inhibitor acetazolamide. 

- Caenorhabditis elegans hypothetical protein F48E8.6. 
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The size of these proteins range from 644 residues (rnb) to 1250 (SSD1). While 
their sequence is highly divergent they share a conserved domain in their C- 
terminal section [4]. It is possible that this domain plays a role in a 

putative exonuclease function that would be common to all these proteins. A signature pattern 
was developed based on the core of this conserved domain. 

Consensus P attem[HI]-[FYE]-{GSiAM-MGSTAjNl SEP IP NO:32>l-{^VMl£LiyNl SEQ.I3D 
NO:43]-x(4 T SVY-pyM^STAL SEP I P NO:473 t1-x-{ TWA CirFWACSE,Q.JP 
NO:71?Yi -rTV1- 

[SA]-P-{W-MA4[L1VMA SEP IP NP:30)!-rRQ1-rKR1-fFY1-x-D-x(3)-fHQ1 
[1] 

Zilhao R., Camelo L., Arraiano CM. 
Mol. Microbiol. 8:43-51(1993). 
[2] 

Noguchi E., Hayashi N., Azuma Y., Seki T., Nakamura M., NakashimaN., 
Yanagida M., He X., Mueller U., Sazer S., Nishimoto T. 
EMBP J. 15:5595-5605(1996). 
[3] 

Beuf L., Bedu S., Cami B., Joset F. 
Plant Mol. Biol. 27:779-788(1995). 
[4] 

Mian I.S. 

Nucleic Acids Res. 25:3187-3195(1997). 

823. Src homology 2 (SH2) domain profile 
PRPSITE cross-reference(s): PS50001; SH2 

The Src homology 2 (SH2) domain is a protein domain of about 100 amino-acid 
residues first identified as a conserved sequence region between the 
oncoproteins Src and Fps [1]. Similar sequences were later found in many other 
intracellular signal-transducing proteins [2]. SH2 domains function as 
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regulatory modules of intracellular signalling cascades by interacting with 
high affinity to phosphotyrosine-containing target peptides in a sequence- 
specific and strictly phosphorylation-dependent manner [3,4,5,6]. 

5 The SH2 domain has a conserved 3D structure consisting of two alpha helices 
and six to seven beta-strands. The core of the domain is formed by a 
continuous beta-meander composed of two connected beta-sheets [7]. 

So far, SH2 domains have been identified in the following proteins: 

10 

- Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) 
protein tyrosine kinases. In particular in the Src, Abl, Bkt, Csk and ZAP70 
families of kinases. 

- Mammalian phosphatidylinositol-specific phospholipase C gamma- 1 and -2. Two 
1 5 copies of the SH2 domain are found in those proteins in between the 

catalytic 'X-' and 'Y-boxes' (see <PDOC50007>). 

- Mammalian phosphatidyl inositol 3 -kinase regulatory p85 subunit. 

- Some vertebrate and invertebrate protein-tyrosine phosphatases. 

- Mammalian Ras GTPase-activating protein (GAP). 

2 0 - Adaptor proteins mediating binding of guanine nucleotide exchange factors 
to growth factor receptors: vertebrate GRB2, Caenorhabditis elegans sem-5 
and Drosophila DRK. 

- Mammalian Vav oncoprotein, a guanine-nucleotide exchange factor of the 
CDC24 family. 

2 5 - Miscellanous proteins interacting with vertebrate receptor protein 

tyrosine kinases: oncoprotein Crk, mammalian cytoplasmic proteins Nek, She. 

- STAT proteins (signal transducers and activators of transcription). 

- Chicken tensin. 

- Yeast transcriptional control protein SPT6. 

30 

The profile developed to detect SH2 domains is based on a structural alignment 
consisting of 8 gap-free blocks and 7 linker regions totaling 92 match 
positions. 
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[1] 

Sadowski I., Stone J.C., Pawson T. 
Mol. Cell. Biol. 6:4396-4408(1986). 
5 [2] 

Russel R.B., Breed J., Barton G.J. 
FEBS Lett. 304:15-20(1992). 
[3] 

Marangere L.E.M., Pawson T. 
10 J. Cell Sci. Suppl. 18:97-104(1994). 
[4] 

Pawson T., Schlessinger J. 
Curr. Biol. 3:434-442(1993). 
[5] 

1 5 Mayer B. J., Baltimore D. 

Trends Cell. Biol. 3:8-13(1993). 
[6] 

Pawson T. 

Nature 373:573-580(1995). 
20 [7] 

Kuriyan J. ? Cowburn D. 

Curr. Opin. Struct. Biol. 3:828-837(1993). 

824. Sulfate transporters signature 
2 5 PROSITE cross-reference(s): PS01 1 30; SULFATE TRANSP 

A number of proteins involved in the transport of sulfate across a membrane 
as well as some yet uncharacterized proteins have been shown [1,2] to be 
evolutionary related. These proteins are: 

30 

- Neurospora crassa sulfate permease II (gene cys-14). 

- Yeast sulfate permeases (genes SUL1 and SUL2). 

- Rat sulfate anion transporter 1 (SAT-1). 
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- Mammalian DTDST, a probable sulfate transporter which, in Human, is 
involved in the genetic disease, diastrophic dysplasia (DTD). 

- Sulfate transporters 1 , 2 and 3 from the legume Stylosanthes hamata. 

5 - Human pendrin (gene PDS), which is involved in a number of hearing loss 
genetic diseases. 

- Human protein DRA (Down-Regulated in Adenoma). 

- Soybean early nodulin 70. 

- Escherichia coli hypothetical protein ychM. 

10 - Caenorhabditis elegans hypothetical protein F41D9.5. 

As expected by their transport function, these proteins are highly hydrophobic 
and seem to contain about 12 transmembrane domains. The best conserved region 
seems to be located in the second transmembrane region and is used as a 
1 5 signature pattern. 

Consensus pattem[PAV]-x-Y-[GS]^Y4S^A€^[STA G S EP ID NO:20)1(2Vx(4)- 

f^WYA ^LIVFYA SEP ID NO:7 l 8Y1 4«WB [bi¥ST SEP ID NO:474)H Yn- 
x(3)-[GA]-[GST]-S-[KR] 

20 

[1] 

Sandal N.N., Marcker K.A. 

Trends Biochem. Sci. 19:19-19(1994). 

[2] 

2 5 Smith F.W., Hawkesford M.J., Prosser I.M., Clarkson D.T. 

Mol. Gen. Genet. 247:709-715(1995). 

825. TYA: TYA transposon protein 

Ty are yeast transposons. A 5.7kb transcript codes for p3 a fusion protein of TYA and TYB. 

3 0 The TYA protein is analogous to the gag protein of retroviruses. TYA a is cleaved to form 

46kd protein which can form mature virion like particles [1]. Number of members: 59 
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[1] Medline: 97404699. Cryo-electron microscopy structure of yeast Ty retrotransposon 
virus-like particles. Palmer KJ, Tichelaar W, Myers N, Burns NR 5 Butcher SJ, Kingsman AJ, 
Fuller SD, Saibil HR; J Virol 1997;71:6863-6868. 

5 826. AldolaseJI 

Class II Aldolase and Adducin N-terminal domain. 

-!- This family includes class II aldolases and adducins which have not been ascribed any 
enzymatic function. Number of members: 37 

10 References: 

[1] Medline: 93294819. The spatial structure of the class II L-fuculose-1 -phosphate aldolase 
from Escherichia coli. Dreyer MK, Schulz GE; J Mol Biol 1993;231:549-553. 
[2] Medline: 96256522. Catalytic mechanism of the metal-dependent fuculose aldolase from 
Escherichia coli as derived from the structure. Dreyer MK, Schulz GE; J Mol Biol 
15 1996;259:458-466. 

827. CBD_2 

-!- Two tryptophan residues are involved in cellulose binding. 

-!- Cellulose binding domain found in bacteria. Number of members: 51 

20 

References: 

[1] Medline: 95284032. Solution structure of a cellulose-binding domain from Cellulomonas 
fimi by nuclear magnetic resonance spectroscopy. Xu GY, Ong E 5 Gilkes NR, Kilburn DG ? 
Muhandiram DR ? Harris-Brandts M, Carver JP, Kay LE ? Harvey TS; Biochemistry 

2 5 1995;34:6993-7009. 

828. P 

A unique feature of the eukaryotic subtilisin-like proprotein convertases is the presence of an 
additional highly conserved sequence of approximately 150 residues (P domain) located 

3 0 immediately downstream of the catalytic domain. 

Number of members: 91 



References: 
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[1] Medline: 94252314. A C-terminal domain conserved in precursor processing proteases is 
required for intramolecular N-terminal maturation of pro-Kex2 protease. Gluschankof P, 
Fuller RS; EMBO J 1994;13:2280-2288. 

[2] Medline: 98225190. Regulatory roles of the P domain of the subtilisin-like prohormone 
5 convertases. Zhou A, Martin S, Lipkind G, LaMendola J, Steiner DF; J Biol Chem 
1998;273:11107-11114. 

829. Uncharacterized protein family UPF0020 signature 
PROSITE cross-reference(s): PS01261; UPF0020 

1 0 The following uncharacterized proteins have been shown [1] to share regions of 
similarities: 

- Escherichia coli hypothetical protein ycbY and HI01 16/15, the corresponding Haemophilus 
influenzae protein. 

15 - Bacillus subtilis hypothetical protein ypsC. 

- Synechocystis strain PCC 6803 hypothetical protein slr0064. 

- Methanococcus jannaschii hypothetical proteins MJ0438 and MJ0710. 

These are hydrophilic proteins of from 40 Kd to about 80 Kd. They can be 
2 0 picked up in the database by the following pattern. 

Consensus patternD-P-^^ff4r iIV^4F SEP ID NO:2) l-C-G-rST1-G-x(3)-rLI1-E 
References: 

25 [1] Bairoch A. Unpublished observations (1997). 

830. Uncharacterized protein family UPF0031 signatures 

PROSITE cross-reference(s): PS01049; UPF0031.J; PS01050; UPF0031_2 
The following uncharacterized proteins have been shown [1] to share regions of 
30 similarities: 

- Yeast chromosome XI hypothetical protein YKL151c. 

- Caenorhabditis elegans hypothetical protein R107.2. 
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- Escherichia coli hypothetical protein yjeF. 

- Bacillus subtilis hypothetical protein yxkO. 

- Helicobacter pylori hypothetical protein HP 13 63. 

- Mycobacterium tuberculosis hypothetical protein MtCY77.05c. 
5 - Mycobacterium leprae hypothetical protein B229 C2 201 . 

- Synechocystis strain PCC 6803 hypothetical protein sill 43 3. 

- Methanococcus jannaschii hypothetical protein MJ1586. 

These are proteins of about 30 to 40 Kd whose central region is well 
1 0 conserved. They can be picked up in the database by the following patterns. 

Consensus pattem[SAV]-[IVWHLVA]-|^ 

SEP ID NQ:719 )1 

Consensus pattem[GA]-G-x-G-D-[TV]-^ SEP ID NQ:4) 1 

15 

831.(ACOX) 
Acyl-CoA oxidase 

This is a family of Acyl-CoA oxidases EC: 1.3. 3. 6. Acyl-coA oxidase converts acyl-CoA into 
2 0 trans-2-enoy 1-Co A [ 1 ] . 

Number of members: 39 

[1] Hayashi H, De Bellis L, Yamaguchi K 5 Kato A, Hayashi M ? Nishimura M; Medline: 
25 98192624. "Molecular characterization of a glyoxysomal long chain acyl-CoA oxidase that is 
synthesized as a precursor of higher molecular mass in pumpkin." J Biol Chem 
1998;273:8301-8307. 



30 



832. (AICARFT IMPCHas) 
AICARFT/IMPCHase bienzyme 
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This is a family of Afunctional enzymes catalysing the last steps in de novo purine 
biosynthesis. The Afunctional enzyme is found in both prokaryotes and eukaryotes. The 
second last step is catalysed by 5-aminoimidazole-4-carboxamide ribonucleotide 
formyltransferase EC :2. 1.2.3 (AICARFT), this enzyme catalyses the formylation of AICAR 
5 with 10-formyl-tetrahydrofolate to yield F AICAR and tetrahydro folate [1]. The last step is 
catalysed by IMP (Inosine monophosphate) cyclohydrolase EC:3.5.4.10 (IMPCHase), 
cyclizing FAICAR (5-formylaminoimidazole-4-carboxamide ribonucleotide) to IMP [1], 

Number of members : 22 

10 

[1] Akira T, Komatsu M, Nango R, Tomooka A, Konaka K 5 Yamauchi M, Kitamura Y, 
Nomura S, Tsukamoto I; Medline: 97473523 "Molecular cloning and expression of a rat 
cDNA encoding 5-aminoimidazole-4-carboxamide ribonucleotide formyltransferase/IMP 
cyclohydrolase" [published erratum appears in Gene 1998 Feb 27;208(2):337] Gene 
15 1997;197:289-293. 

[2] Rayl EA, Moroson BA, Beardsley GP; Medline: 96147205 "The human purH gene 
product, 5-aminoimidazole-4-carboxamide ribonucleotide formyltransferase/IMP 
cyclohydrolase. Cloning, sequencing, expression, purification, kinetic analysis, and domain 
mapping." J Biol Chem 1996;271:2225-2233. 

20 

833. (AOX) 
Alternative oxidase 

2 5 The alternative oxidase is used as a second terminal oxidase in the mitochondria, electrons 
are transfered directly from reduced ubiquinol to oxygen forming water [2]. This is not 
coupled to ATP synthesis and is not inhibited by cyanide, this pathway is a single step 
process [1]. In rice the transcript levels of the alternative oxidase are increased by low 
temperature [1]. 

30 



Number of members: 27 
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[1] Ito Y, Saisho D, Nakazono M, Tsutsumi N, Hirai A; Medline: 9808621 1 "Transcript 
levels of tandem-arranged alternative oxidase genes in rice are increased by low 
temperature." Gene 1997;203:121-129. 

5 [2] Li Q, Ritzel RG, McLean LL 5 Mcintosh L, Ko T, Bertrand H, Nargang FE; Medline: 
96366413 "Cloning and analysis of the alternative oxidase gene of Neurospora crassa." 
Genetics 1996;142:129-140. 

10 834. (APH) 

Protein kinases signatures and profile 

Cross-reference(s): PS00107; PROTEIN_KINASE_ATP, PS00108; 
PROTEIN_KINASE_ST, PS00109; PROTEIN JCINASE_TYR, PS5001 1; 
1 5 PROTEIN JCINASE_DOM 

Eukaryotic protein kinases [1 to 5] are enzymes that belong to a very extensive family of 
proteins which share a conserved catalytic core common to both serine/threonine and tyrosine 
protein kinases. There are a number of conserved regions in the catalytic domain of protein 

2 0 kinases. Two of these regions have been selected to build signature patterns. The first region, 
which is located in the N-terminal extremity of the catalytic domain, is a glycine-rich stretch 
of residues in the vicinity of a lysine residue, which has been shown to be involved in ATP 
binding. The second region, which is located in the central part of the catalytic domain, 
contains a conserved aspartic acid residue which is important for the catalytic activity of the 

2 5 enzyme [6]; two signature patterns were derived for that region: one specific for serine/ 

threonine kinases and the other for tyrosine kinases. A profile was developed which is based 
on the alignment in [1] and covers the entire catalytic domain. 

Consensus pattern: [LIV]-G-{P}-G-{P}-|¥y m iGSm!-q[FYWMGS TN H SEQ TP 
3 0 NO:441 )j -[SGA]-{PW}-P^eA^fT,I VCAT S EP I D NO: 442Y|-{PD>-x- 
{:GSTA€LA^Mm[ GSTACLIVMFY SEQ I'D NO:443) 1-x(5,18> 

NQ:44^ SEQ I DNO:446 VI-K [K binds ATP] 
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Sequences known to belong to this class detected by the pattern the majority of known 
protein kinases but it fails to find a number of them, especially viral kinases which are quite 
divergent in this region and are completely missed by this pattern. 

5 

Consensus pattern: I^MF¥^[LIVMFY C SEP ID NO:6)]-x-rHY1-x-D- 

\-hW'M¥Y }l LI V MF Y SEP IP NO: S 8)|-K-x(2VN4WMF¥CTIIJVMFYCT SEP ID 
NQ: 44 7)J(3) [D is an active site residue] 

1 0 Sequences known to belong to this class detected by the pattern. Most serine/ threonine 

specific protein kinases with 10 exceptions (half of them viral kinases) and also Epstein-Barr 
virus BGLF4 and Drosophila ninaC which have respectively Ser and Arg instead of the 
conserved Lys and which are therefore detected by the tyrosine kinase specific pattern 
described below. 

15 

Consensus pattern: [MVM£-¥€4 

[LIVM F YC} jIiVMPYC SEP ID NO:6)1 (3) [D is an active site residue] tyrosine specific 
protein kinases with the exception of human ERBB3 and mouse blk. This pattern will also 
2 0 detect most bacterial aminoglycoside phosphotransferases [8,9] and herpesviruses ganciclovir 
kinases [10]; which are proteins structurally and evolutionary related to protein kinases. 
Sequences known to belong to this class detected by the profile ALL, except for three viral 
kinases. This profile also detects receptor guanylate cyclases (see <PDOC00430>) and 2-5A- 
dependent ribonucleases. Sequence similarities between these two families and the eukaryotic 

2 5 protein kinase family have been noticed before. It also detects Arabidopsis thaliana kinase- 

like protein TMKL1 which seems to have lost its catalytic activity. 

Note if a protein analyzed includes the two protein kinase signatures, the probability of it 
being a protein kinase is close to 100%. Note eukaryotic-type protein kinases have also been 

3 0 found in prokaryotes such as Myxococcus xanthus [11] and Yersinia pseudotuberculosis. 

Note the patterns shown above has been updated since their publication in [7]. Note this 
documentation entry is linked to both signature patterns and a profile. As the profile is much 
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more sensitive than the patterns, you should use it if you have access to the necessary 
software tools to do so. 

References 

5 [1] Hanks S.K., Hunter T., FASEB J. 9:576-596(1995). 

[ 2] Hunter T., Meth. Enzymol. 200:3-37(1991). 

[ 3] Hanks S.K., Quinn A.M., Meth. Enzymol. 200:38-62(1991). 

[ 4] Hanks S.K., Curr. Opin. Struct. Biol. 1:369-383(1991). 

[ 5] Hanks S.K., Quinn A.M., Hunter T., Science 241:42-52(1988). 
10 [6] Knighton D.R., Zheng J., Ten Eyck L.F., Ashford V. A., Xuong N.-H., Taylor, S.S., 

Sowadski J.M., Science 253:407-414(1991). 

[ 7] Bairoch A., Claverie J.-M., Nature 331:22(1988). 

[ 8] Benner S., Nature 329:21-21(1987). 

[ 9] Kirby R., J. Mol. Evol. 30:489-492(1992). 
15 [10] Littler E., Stuart A.D., Chee M.S., Nature 358:160-162(1992). 

[1 1] Munoz-Dorado J., Inouye S., Inouye M., Cell 67:995-1006(1991). 

835. (Asp_Glu_race) 

2 0 Aspartate and glutamate racemases signatures 

Cross-reference(s) PS00923; ASP_GLU_RACEMASE_1 PS00924; 
ASP_GLU_RACEMASE_2 

25 Aspartate racemase (EC 5.1.1.13) and glutamate racemase (EC 5.1.1.3) are two evolutionary 
related bacterial enzymes that do not seem to require a cofactor for their activity [1]. 
Glutamate racemase, which interconverts L-glutamate into D-glutamate, is required for the 
biosynthesis of peptidoglycan and some peptide-based antibiotics such as gramicidin S. In 
addition to characterized aspartate and glutamate racemases, this family also includes a 

3 0 hypothetical protein from Erwinia carotovora and one from Escherichia coli (ygeA). Two 

conserved cysteines are present in the sequence of these enzymes. They are expected to play 
a role in catalytic activity by acting as bases in proton abstraction from the substrate. 
Signature patterns were developed for both cysteines. 
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Consensus pattern: [IVA]-{WM} [LIVM SEP ID NO:4yj -x-C-x(0,l)-N-rST1-rMSA1-rSTHl- 

!-M¥F¥S^ANKr MLIVFYSTANK SEP ID NO:720Y i 

5 Consensus pattern: pj^¥M) fLIVM SEP ID 'NO:4>]( 2Vx-[AG1-C-T-|"DEH]- 

ffcfi%fF¥j| I.,IVMFY SEP ID NO: ISV j-fPNGR^lFNGRS SEP ID IMP : 721 VI -x- 
ftJVM4 fUVM SEP ID NO:4V| 

[ 1] Gallo K.A., Knowles J.R., Biochemistry 32:3981-3990(1993). 

10 

836. (ATP-sulfurylase) 
ATP-sulfurylase 

1 5 This family consists of ATP-sulfurylase or sulfate adenylyltransferase EC:2.7.7.4 some of 

which are part of a bifunctional polypeptide chain associated with adenosyl phosphosulphate 
(APS) kinase APS kinase. Both enzymes are required for PAPS (phosphoadenosine- 
phosphosulfate) synthesis from inorganic sulphate [2]. ATP sulfurylase catalyses the 
synthesis of adenosine-phosphosulfate APS from ATP and inorganic sulphate [1]. 

20 

Number of members: 37 

[1] Kurima K, Warman ML, Krishnan S, Domowicz M, Krueger RC Jr, Deyrup A, Schwartz 
NB; Medline: 98337975 "A member of a family of sulfate-activating enzymes causes murine 
2 5 brachymorphism" [published erratum appears in Proc Natl Acad Sci U S A 1998 Sep 
29;95(20): 12071] Proc Natl Acad Sci U S A 1998;95:8681-8685. 

[2] Rosenthal E, Leustek T; Medline: 96096529 "A multifunctional Urechis caupo protein, 
PAPS synthetase, has both ATP sulfurylase and APS kinase activities." Gene 1995;165:243- 
30 248. 



837. (ATP-synt_F) 
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ATP synthase (F/14-kDa) subunit 

This family includes 14-kDa subunit from vATPases [1], which is in the peripheral catalytic 
part of the complex [2], The family also includes archaebacterial ATP synthase subunit F [3]. 

5 

Number of members: 23 

[1] Guo Y, Kaiser K, Wieczorek H, Dow JA; Medline: 9626941 1 "The Drosophila 
melanogaster gene vhal4 encoding a 14-kDa F-subunit of the vacuolar ATPase." Gene 
10 1996;172:239-243. 

[2] Peng SB, Crider BP, Tsai SJ, Xie XS, Stone DK; Medline: 96216416 "Identification of a 
14-kDa subunit associated with the catalytic sector of clathrin-coated vesicle H+- ATPase." J 
Biol Chem 1996;271:3324-3327. 

[3] Wilms R ? Freiberg C, Wegerle E ? Meier I, Mayer F, Muller V; Medline: 96324968 
1 5 "Subunit structure and organization of the genes of the Al AO ATPase from the Archaeon 
Methanosarcina mazei Gol." J Biol Chem 1996;271:18843-18852. 

838. (CBD_4) 
2 0 Starch binding domain 

Number of members: 48 
2 5 839. (CbiX) 

The function of CbiX is uncertain, however it is found in cobalamin biosynthesis operons and 
so may have a related function. Some CbiX proteins contain a striking histidine-rich region at 
their C-terminus, which suggests that it might be involved in metal chelation [1], 

30 

Number of members: 6 
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[1] Raux E, Lanois A, Warren MJ, Rambach A, Thermes C; Medline: 98416126 "Cobalamin 
(vitamin B12) biosynthesis: identification and characterization of a Bacillus megaterium cobl 
operon." Biochem J 1998;335:159-166. 

5 

840. (Complexl_51K) 

Respiratory-chain NADH dehydrogenase 51 Kd subunit signatures Cross-reference(s) 
PS00644; COMPLEXl_51K_l PS00645; COMPLEX 15 1 K_2 

10 

Respiratory-chain NADH dehydrogenase (EC 1.6.5.3) [1,2] (also known as complex I or 
NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the inner 
mitochondrial membrane which also seems to exist in the chloroplast and in cyanobacteria 
(as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this 
1 5 bioenergetic enzyme complex there is one with a molecular weight of 5 1 Kd (in mammals), 
which is the second largest subunit of complex I and is a component of the iron-sulfur (IP) 
fragment of the enzyme. It seems to bind to NAD, FMN, and a 2Fe-2S cluster. 

The 51 Kd subunit is highly similar to [3,4]: 
20 - Subunit alpha of Alcaligenes eutrophus NAD-reducing hydrogenase (gene hoxF) which 
also binds to NAD, FMN, and a 2Fe-2S cluster. 

- Subunit NQOl of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 

- Subunit F of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoF). 

2 5 The 5 1 Kd subunit and the bacterial hydrogenase alpha subunit contains three regions of 

sequence similarities. The first one most probably corresponds to the NAD-binding site, the 
second to the FMN-binding site, and the third one, which contains three cysteines, to the iron- 
sulfur binding region. Signature patterns have been developed for the FMN-binding and for 
the 2Fe-2S binding regions. 



Consensus pattern: G-[AM]-G-[AR]-Y-fel^M-i! LI VM SEP ID N Q:4) j-C-G-[DE](2V 
[STA](2)-[LIM](2)-[EN]- S 

Consensus pattern: E-S-C-G-x-C-x-P-C-R-x-G [The three C's are putative 2Fe-2S ligands] 
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[ 1] Ragan C.I., Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T., Hofhaus G., Preis D., Eur. J. Biochem. 197:563-576(1991). 
[ 3] Fearnley I.M., Walker J.E. Biochim. Biophys. Acta 1 140:105-134(1992). 
5 [ 4] Weidner U., Geier S., Ptock A., Friedrich T., Leif H., Weiss H., J. Mol. Biol. 233:109- 
122(1993). 

841. (DAP_epimerase) 

1 0 Diaminopimelate epimerase signature 

Cross-reference(s) PS01326; DAPEPIMERASE 

Diaminopimelate epimerase (EC 5.1.1.7) catalyzes the isomeriazation of L ? L- to D,L-meso- 
diaminopimelate in the biosynthetic pathway leading from aspartate to lysine. This enzyme is 
15 a protein of about 30 Kd. Two conserved cysteines seem [1] to function as the acid and base 
in the catalytic mechanism. As a signature pattern, the region surrounding the first of these 
two active site cysteines were selected. 

Consensus pattern: N-x-D-G-S-x(4)-C-G-N-[GA]-x-R [C is an active site residue] Sequences 
2 0 known to belong to this class detected by the pattern ALL, except for an Anabaena dapF 
which has a Ser instead of the active site Cys. 

[ 1] Cirilli M, Zheng R., Scapin G. 5 Blanchard J.S. 5 Biochemistry 37:16452-16458(1998). 

25 

842. (DNA_gyraseB_C) 

DNA topoisomerase II signature 

Cross-reference(s) PS00177; TOPOISOMERASEJI 
30 DNA topoisomerase I (EC 5.99.1.2) [1,2 5 3,4 5 E1] is one of the two types of enzyme that 

catalyze the interconversion of topological DNA isomers. Type II topoisomerases are ATP- 
dependent and act by passing a DNA segment through a transient double-strand break. 
Topoisomerase II is found in phages, archaebacteria, prokaryotes, eukaryotes, and in 
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African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three 
subunits (the product of genes 39, 52 and 60). In prokaryotes and in archaebacteria the 
enzyme, known as DNA gyrase, consists of two subunits (genes gyrA and gyrB [E2]). In 
some bacteria, a second type II topoisomerase has been identified; it is known as 
5 topoisomerase IV and is required for chromosome segregation, it also consists of two 
subunits (genes parC and parE). In eukaryotes, type II topoisomerase is a homodimer. 

There are many regions of sequence homology between the different subtypes of 
topoisomerase II. The relation between the different subunits is shown in the following 
1 0 representation : 

< About- 1 400-residues > 

[ Protein 39-* ][— - Protein 52 — ] Phage T4 

15 [ gy r B * ][ gyrA ] Prokaryote II 

Archaebacteria 

[ parE * ] [ parD ] Prokaryote IV 

[ *- — ] Eukaryote and 

ASF 

20 '*': Position of the pattern. 

As a signature pattern for this family of proteins, a region that contains a highly conserved 
pentapeptide was selected. The pattern is located in gyrB, in parE, and in protein 39 of phage 
T4 topoisomerase. 

25 

Consensus pattern: *MVMA)[UVMA SEP ID NO:30)1 -x-E-G-rDN1-S-A-x-fgPAQ4 fSTA.G 
SE C ID ] siO:2 0Vl 

[ 1] Sternglanz R., Curr. Opin. Cell Biol. 1:533-535(1990). 
30 [ 2] Bjornsti M.-A., Curr. Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Sharma A., Mondragon A., Curr. Opin. Struct. Biol. 5:39-47(1995). 
[ 4] Roca J., Trends Biochem. Sci. 20:156-160(1995). 
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843.(DUF16) 

Protein of unknown function 

5 The function of this protein is unknown. It appears to only occur in Mycoplasma 
pneumoniae. 

Number of members: 26 

10 [1] Himmelreich R, Hilbert H, Plagens H, Pirkl E, Li BC, Herrmann R; Medline: 97105885 
"Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae." 
Nucleic Acids Res 1996;24:4420-4449. 

15 844. (DUF21) 

Domain of unknown function 

This transmembrane region has no known function. Many of the sequences in this family are 
2 0 annotated as hemolysins, however this is due to a similarity to Swiss:Q543 1 8 that does not 
contain this domain. This domain is found in the N-terminus of the proteins adjacent to two 
intracellular CBS domains CBS. 

Number of members: 42 

25 

845. (DUF56) 

Integral membrane protein 

30 

The members of this family are putative integral membrane proteins. The function of the 
family is unknown, however the family includes Sec59 from yeast. Sec59 is a dolichol 
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kinase EC:2.7.1.108, but it is not clear if the enzymatic activity resides in this region or its N 
terminal region. 

Number of members: 13 

5 

846. (DUF94) 

Domain of unknown function 

10 

The function of this domain is unknown. It is found in both eukaryotes and archaebacteria. 
The alignment contains a completely conserved aspartate residue that may be functionally 
important. The eukaryotic domains contains three conserved cysteines and a histidine that 
might be metal binding, however these are absent in the archaebacterial proteins. 

15 

Number of members: 9 

847. (FF) 

20 

FF domain 

This domain may be involved in protein-protein interaction [1]. 
2 5 Number of members: 42 

[1] Bedford MT, Leder P; Medline: 99322199 "The FF domain: a novel motif that often 
accompanies WW domains." Trends Biochem Sci 1999;24:264-265. 

30 

848. (FLO_LFY) 
Floricaula / Leafy protein 
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This family consists of various plant development proteins which are homologues of 
floricaula (FLO) and Leafy (LFY) proteins which are floral meristem identity proteins. 
Mutations in the sequences of these proteins affect flower and leaf development. 

5 Number of members: 16 

[1] Hofer J 5 Turner L, Hellens R 5 Ambrose M ? Matthews P, Michael A, Ellis N; Medline: 
9741 1151 "UNIFOLIATA regulates leaf and flower morphogenesis in pea." Curr Biol 
1997;7:581-587. 

1 0 [2] Weigel D, Alvarez J, Smyth DR, Yanofsky MF, Meyerowitz EM; Medline: 92274452 
"LEAFY controls floral meristem identity in Arabidopsis." Cell 1992;69:843-859. 

849. (G-patch) 

1 5 G-patch domain 

This domain is found in a number of RNA binding proteins, and is also found in proteins that 
contain RNA binding domains. This suggests that this domain may have an RNA binding 
function. This domain has seven highly conserved glycines. 

20 

Number of members: 47 

[1] Aravind L 5 Koonin EV; Medline: 10470032 "G-patch: a new conserved domain in 
eukaryotic RNA-processing proteins and type D retroviral polyproteins." Trends Biochem 
2 5 Sci 1999;24:342-344. 

850. (Gram-ve_porins) 

General diffusion Gram-negative porins signature 

30 

Cross-reference(s) PS00576; GRAM_NEG_PORIN 

The outer membrane of Gram-negative bacteria acts as a molecular filter for hydrophilic 
compounds. Proteins, known as porins [1], are responsible for the 'molecular sieve' properties 
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of the outer membrane. Porins form large water- filled channels which allows the diffusion of 
hydrophilic molecules into the periplasmic space. Some porins form general diffusion 
channels that allows any solutes up to a certain size (that size is known as the exclusion limit) 
to cross the membrane, while other porins are specific for a solute and contain a binding site 
5 for that solute inside the pores (these are known as selective porins). As porins are the major 
outer membrane proteins, they also serve as receptor sites for the binding of phages and 
bacteriocins. General diffusion porins generally assemble as trimer in the membrane and the 
transmembrane core of these proteins is composed exclusively of beta strands [2]. It has been 
shown [3] that a number of general porins are evolutionary related, these porins are: 
10 - Enterobacteria phoE. 

- Enterobacteria ompC. 

- Enterobacteria ompF. 

- Enterobacteria nmpC. 

- Bacteriophage PA-2 LC. 
15 - Neisseria PI. A. 

- Neisseria PI.B. 

As a signature pattern a conserved region was selected, located in the C-terminal part of these 
proteins, which spans two putative transmembrane beta strands. 

20 

Consensus pattern: [LIV M FYj[LI ¥M FY SEP ID N O:18) 3-x(2)-G-x(2)-Y-x-F-x-K-x(2)- 
[SN]4S¥AV4[S TAV SEP ID NO:H)5yi4inA^ SEP ID NO:26}j- V 

[1] Benz R., Bauer K., Eur. J. Biochem. 176:1-19(1988). 
25 [2] Jap B.K., Walian P.J., Q. Rev. Biophys. 23:367-403(1990). 

[3] Jeanteur D., Lakey J.H., Pattus F., MoL Microbiol. 5:2153-2164(1991). 

851.(HlyD) 
3 0 HlyD family secretion proteins signature 

Cross-reference(s) PS00543; HLYD FAMILY 
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Gram-negative bacteria produce a number of proteins which are secreted into the growth 
medium by a mechanism that does not require a cleaved N-terminal signal sequence. These 
proteins, while having different functions, require the help of two or more proteins for their 
secretion across the cell envelope. Amongst which a protein belonging to the ABC 
transporters family (see the relevant entry <PDOC00185>) and a protein belonging to a 
family which is currently composed [1 to 5] of the following members: 
Gene Species Protein which is exported 



hlyD Escherichia coli Hemolysin 
appD A.pleuropneumoniae Hemolysin 
lcnD Lactococcus lactis Lactococcin A 
lktD A.actinomycetemcomitans Leukotoxin 

Pasteurella haemolytica 
rtxD A.pleuropneumoniae Toxin-III 

cyaD Bordetella pertussis Calmodulin-sensitive adenylate cyclase- 
hemolysin (cyclolysin) 
cvaA Escherichia coli Colicin V 

prtE Erwinia chrysanthemi Extracellular proteases B and C 
aprE Pseudomonas aeruginosa Alkaline protease 
emrA Escherichia coli Drugs and toxins 
yjcR Escherichia coli Unknown 

These proteins are evolutionary related and consist of from 390 to 480 amino acid residues. 
They seem to be anchored in the inner membrane by a N-terminal transmembrane region. 
Their exact role in the secretion process is not yet known. The C-terminal section of these 
proteins is the best conserved region; a signature pattern from that region was derived. 

Consensus pattern: 

NO:! ?]-[GE]-x-[KR]-x 4LIVMFYW3 [LIVMF YW SEP m NQ^6il(2>x- 

[UWM¥^^\ UVM FYW SEP ID NO:26Vl(3) 

Sequences known to belong to this class detected by the pattern ALL, except for emrA and 
yjcR. 
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References: 

[1] Gilson L., Mahanty H.K., Kolter R., EMBO J. 9:3875-3884(1990). 
[2] Letoffe S., Delepelaire P., Wandersman C, EMBO J. 9:1375-1382(1990). 
[3] Stoddard G.W., Petzel J.P., van Belkum M.J., Kok J., McKay L.L., Appl. Environ. 
5 Microbiol. 58:1952-1961(1992). 

[4] Duong F. 5 Lazdunski A., Cami B., Murgier M., Gene 121:47-54(1992). 
[5] Lewis K., Trends Biochem. Sci. 19:119-123(1994). 

10 852. (IBR) 

In Between Ring fingers 

The IBR (In Between Ring fingers) domain is found to occur between pairs of ring fingers 
(zf-C3HC4). The function of this domain is unknown. This domain has also been called the 
15 C6HC domain and DRIL (for double RING finger linked) domain [2]. 
Number of members: 25 

[1] Morett E, Bork P; Medline: 10366851 "A novel transactivation domain in parkin. "Trends 
Biochem Sci 1999;24:229-231. 
2 0 [2] van der Reijden BA, Erpelinck-Verschueren CA, Lowenberg B s Jansen JH; Medline: 
99349709 "TRIADs: a new class of proteins with a novel cysteine-rich signature." Protein 
Sci 1999;8:1557-1561. 

2 5 853.(IPPT) 

IPP transferase 

[1] Durand JM, Bjork GR, Kuwae A, Yoshikawa M, Sasakawa C; Medline: 97440126 "The 
modified nucleoside 2-methylthio-N6-isopentenyladenosine in tRNA of Shigella flexneri is 
30 required for expression of virulence genes." J Bacteriol 1997;179:5777-5782. 

[2] Boguta M, Hunter LA, Shen WC, Gillman EC, Martin NC, Hopper AK; Medline: 
94187700 "Subcellular locations of MOD5 proteins: mapping of sequences sufficient for 
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targeting to mitochondria and demonstration that mitochondrial and nuclear isoforms 
commingle in the cytosol " Mol Cell Biol 1994;14:2298-2306. 
[3] Gillman EC, Slusher LB, Martin NC, Hopper AK; Medline: 91203856 "MODS 
translation initiation sites determine N6-isopentenyladenosine modification of mitochondrial 
5 and cytoplasmic tRNA." Mol Cell Biol 1991 ;1 1 :2382-2390. 

854. (KE2) 

KE2 family protein 

10 

The function of members of this family is unknown, although they have been suggested to 
contain a DNA binding leucine zipper motif [2]. 

Number of members: 9 

15 

[1] Ha H, Abe K, Artzt K; Medline: 92084131 "Primary structure of the embryo-expressed 
gene KE2 from the mouse H-2K region." Gene 1991 ;107:345-346. 

[2] Shang HS, Wong SM, Tan HM, Wu M; Medline: 95129859 "YKE2, a yeast nuclear gene 
encoding a protein showing homology to mouse KE2 and containing a putative leucine- 
2 0 zipper motif." Gene 1994;151:197-201. 

855. (Lipoprotein^)) 

Prokaryotic membrane lipoprotein lipid attachment site 

25 

Cross-reference(s) PS00013; PROKAR LIPOPROTErN 

In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, 
which is cleaved by a specific lipoprotein signal peptidase (signal peptidase II). The 
peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to which 
30 a glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see [1,2,3]): 

- Major outer membrane lipoprotein (murein-lipoproteins) (gene lpp). 

- Escherichia coli lipoprotein-28 (gene nip A). 
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- Escherichia coli lipoprotein-34 (gene nlpB). 

- Escherichia coli lipoprotein nlpC. 

- Escherichia coli lipoprotein nlpD. 

- Escherichia coli osmotically inducible lipoprotein B (gene osmB). 
5 - Escherichia coli osmotically inducible lipoprotein E (gene osmE). 

- Escherichia coli peptidoglycan-associated lipoprotein (gene pal). 

- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

- Escherichia coli copper homeostasis protein cutF (or nlpE). 

- Escherichia coli plasmids traT proteins. 

10 - Escherichia coli Col plasmids lysis proteins. 

- A number of Bacillus beta-lactamases. 

- Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 

- Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 

- Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). 
15 - Chlamydia trachomatis outer membrane protein 3 (gene omp3). 

- Fibrobacter succinogenes endoglucanase cel-3. 

- Haemophilus influenzae proteins Pal and Pep. 

- Klebsiella pullulunase (gene pulA). 

- Klebsiella pullulunase secretion protein pulS. 
2 0 - Mycoplasma hyorhinis protein p37. 

- Mycoplasma hyorhinis variant surface antigens A, B ? and C (genes vlpABC). 

- Neisseria outer membrane protein H.8. 

- Pseudomonas aeruginosa lipopeptide (gene lppL). 

- Pseudomonas solanacearum endoglucanase egl. 

2 5 - Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 

- Rickettsia 1 7 Kd antigen. 

- Shigella flexneri invasion plasmid proteins mxiJ and mxiM. 

- Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 

- Treponema pallidium 34 Kd antigen. 

30 - Treponema pallidium membrane protein A (gene tmpA). 

- Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence plasmid protein yscJ. 
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- Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper-binding 
protein. This is the first archaebacterial protein known to be modified in such a fashion). 

From the precursor sequences of all these proteins, a consensus pattern and a set of rules 
5 to identify this type of post-translational modification were derived. 

Consensus pattern: fDERKrl- [PERK SEP ID NO:354U(6)- 

fW^¥g^A^Q}fLJVM FYSTA .GCQ SEP ID NO:3 S3 )]-[AGS]-C [C is the lipid 
10 attachment site] Additional rules: 1) 

The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There 
must be at least one Lys or one Arg in the first seven positions of the sequence. Sequences 
known to belong to this class detected by the pattern ALL. Other sequence(s) detected in 
1 5 SWISS-PROT some 100 prokaryotic proteins. Some of them are not membrane lipoproteins, 
but at least half of them could be. 

References 

[1] Hayashi S., Wu H.C., J. Bioenerg. Biomembr. 22:451-471(1990). 
20 [2] Klein P., Somorjai R.L., Lau P.C.K., Protein Eng. 2:15-20(1988). 
[3] von Heijne G., Protein Eng. 2:531-534(1989). 

[4] Mattar S. 5 Scharf B., Kent S.B.H., Rodewald K. ? Oesterhelt D. ? Engelhard M. J. Biol. 
Chem. 269:14939-14945(1994). 

25 

856. (Lipoprotein_7) 
Adhesin lipoprotein 

This family consists of the p50 and variable adherence-associated antigen (Vaa) adhesins 
30 from Mycoplasma hominis. M. hominis is a mycoplasma associated with human urogenital 
diseases, pneumonia, and septic arthritis [1]. An adhesin is a cell surface molecule that 
mediates adhesion to other cells or to the surrounding surface or substrate. The Vaa antigen is 
a 50-kDa surface lipoprotein that has four tandem repetitive DNA sequences encoding a 
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periodic peptide structure, and is highly immunogenic in the human host [1]. p50 is also a 50- 
kDa lipoprotein, having three repeats A ? B and C, that may be a tetramer of 191-kDa in its 
native environment [2], 

5 Number of members: 18 

[1] Zhang Q, Wise KS; Medline: 96294788 "Molecular basis of size and antigenic variation 
of a Mycoplasma hominis adhesin encoded by divergent vaa genes. " Infect Immun 
1996;64:2737-2744. 

1 0 [2] Henrich B, Kitzerow A, Feldmann RC, Schaal H 5 Hadding U; Medline: 97047675 
"Repetitive elements of the Mycoplasma hominis adhesin p50 can be differentiated by 
monoclonal antibodies." Infect Immun 1996;64:4027-4034. 

15 857. (MaoCJike) 
MaoC like domain 

The MaoC protein is found to share similarity with a wide variety of enzymes; estradiol 1 7 
beta-dehydrogenase 4, peroxisomal hydratase-dehydrogenase-epimerase, fatty acid synthase 
2 0 beta subunit. All these enzymes contain other domains. This domain is also present in the 

NodN nodulation protein N. No specific function has been assigned to this region of any of 
these proteins. The maoC gene is part of a operon with maoA which is involved in the 
synthesis of monoamine oxidase [1]. 

2 5 Number of members: 46 

[1] Sugino H, Sasaki M, Azakami H, Yamashita M, Murooka Y Medline: 96235221 "A 
monoamine-regulated Klebsiella aerogenes operon containing the monoamine oxidase 
structural gene (maoA) and the maoC gene." J Bacteriol 1992;174:2485-2492. 

30 



858. (MSP) 

Manganese-stabilizing protein / photosystem II polypeptide 
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This family consists of the 33 KDa photosystem II polypeptide from the oxygen evolving 
complex (OEC) of plants and cyanobacteria. The protein is also known as the manganese- 
stabilizing protein as it is associated with the manganese complex of the OEC and may 
5 provide the ligands for the complex [1]. 

Number of members: 1 7 

[1] Philbrick JB, Zilinskas BA; Medline: 88334494 "Cloning, nucleotide sequence and 
10 mutational analysis of the gene encoding the Photosystem II manganese-stabilizing 
polypeptide of Synechocystis 6803." Mol Gen Genet 1988;212:418-425. 

859. (NAC) 

15 

[1] Makarova KS> Aravind L 5 Galperin MY, Grishin NV, Tatusov RL, Wolf YI, Koonin EV; 
Medline: 99342100 "Comparative genomics of the Archaea (Euryarchaeota): evolution of 
conserved protein families, the stable core, and the variable shell." Genome Res 1999;9:608- 
628. 

20 

Number of members: 27 

860. (Nop) 

2 5 Putative snoRNA binding domain 

This family consists of various Pre RNA processing ribonucleoproteins. The function of the 
aligned region is unknown however it may be a common RNA or snoRNA or Noplp binding 
domain. Nop5p (Nop58p) Swiss:Q12499 from yeast is the protein component of a 
30 ribonucleoprotein protein required for pre- 18s rRNA processing and is suggested to function 
with Noplp in a snoRNA complex [1]. Nop56p Swiss:O00567 and Nop5p interact with 
Noplp and are required for ribosome biogenesis [2]. Prp31p Swiss:p49704 is required for 
pre-mRNA splicing in S. cerevisiae [3]. 
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Number of members: 23 

[1] Wu P 5 Brockenbrough JS, Metcalfe AC, Chen S, Aris JP; Medline: 98298165 "Nop5p is a 
5 small nucleolar ribonucleoprotein component required for pre- 18 S rRNA processing in 
yeast." J Biol Chem 1998;273:16453-16463. 

[2] Gautier T, Berges T, Tollervey D, Hurt E;Medline: 8038777 "Nucleolar KKE/D repeat 
proteins Nop56p and Nop58p interact with Noplp and are required for ribosome biogenesis." 
Mol Cell Biol 1997;17:7088-7098. 
1 0 [3] Weidenhammer EM, Singh M 5 Ruiz-Noriega M, Woolford JL Jr; Medline: 961 84869 

"The PRP31 gene encodes a novel protein required for pre-mRNA splicing in Saccharomyces 
cerevisiae." Nucleic Acids Res 1996;24:1 164-1 170. 

15 861.(Nramp) 

Natural resistance-associated macrophage protein 

The natural resistance-associated macrophage protein (NRAMP) family consists of Nrampl, 
Nramp2, and yeast proteins Smfl and Smf2. The NRAMP family is a novel family of 

2 0 functional related proteins defined by a conserved hydrophobic core of ten transmembrane 

domains [5], This family of membrane proteins are divalent cation transporters. Nrampl is an 
integral membrane protein expressed exclusively in cells of the immune system and is 
recruited to the membrane of a phagosome upon phagocytosis [1]. By controlling divalent 
cation concentrations Nrampl may regulate the interphagosomal replication of bacteria [1]. 

25 Mutations in Nrampl may genetically predispose an individual to susceptibility to diseases 
including leprosy and tuberculosis conversely this might however provide protection form 
rheumatoid arthritis [1]. Nramp2 is a multiple divalent cation transporter for Fe2+ ? Mn2+ and 
Zn2+ amongst others it is expressed at high levels in the intestine; and is major transferrin- 
independent iron uptake system in mammals [1]. The yeast proteins Smfl and Smf2 may also 

30 transport divalent cations [3]. 



Number of members: 36 
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[1] Govoni G, Gros P; Medline: 98383996 "Macrophage NRAMP1 and its role in resistance 
to microbial infections." Inflamm Res 1998;47:277-284. 

[2] Agranoff DD, Krishna S Medline: 98294035 "Metal ion homeostasis and intracellular 
parasitism." Mol Microbiol 1998;28:403-412. 
5 [3] Pinner E, Gruenheid S, Raymond M, Gros P; Medline: 98030569 "Functional 

complementation of the yeast divalent cation transporter family SMF by NRAMP2, a 
member of the mammalian natural resistance- associated macrophage protein family." J Biol 
Chem 1997;272:28933-28938. 

[4] Cellier M ? Belouchi A, Gros P; Medline: 96402487 "Resistance to intracellular infections: 
1 0 comparative genomic analysis of Nramp." Trends Genet 1996;12:201-204. 

[5] Cellier M, Prive G, Belouchi A 5 Kwan T, Rodrigues V, Chia W, Gros P; Medline: 
96036029 "Nramp defines a family of membrane proteins." Proc Natl Acad Sci USA 
1995;92:10089-10093. 

15 

862. (NTP_transf_2) 
Nucleotidyltransferase domain 

Members of this family belong to a large family of nucleotidyltransferases [1]. 

20 

Number of members: 83 

[1] Holm L, Sander C; Medline: 96005605 "DNA polymerase beta belongs to an ancient 
nucleotidyltransferase superfamily." Trends Biochem Sci 1995;20:345-347. 

25 

863. (Paramyxo P) 
Paramyxovirus P phosphoprotein 

3 0 This family consists of paramyxovirus P phosphoprotein from sendai virus and human and 
bovine parainfluenza viruses. The P protein is an essential part of the viral RNA polymerase 
complex formed form the P and L proteins [1]. The exact role of the P protein in this complex 
in unknown but it is involved in multiple protein-protein interactions and binding the 
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polymerase complex to the nucleocapsid or ribonucleoprotein template [1]. It also appears to 
be important for the proper folding of the L protein [1]. The paramyxoviruses have a 
negative sense ssRNA genome [1]. 

5 Number of members: 15 

[1] Bowman MC, Smallwood S 9 Moyer SA; Medline: 99329169 "Dissection of Individual 
Functions of the Sendai Virus Phosphoprotein in Transcription." J Virol 1999;73:6474-6483. 
[2] Matsuoka Y, Curran J, Pelet T, Kolakofsky D, Ray R, Compans RW; Medline: 91237868 
1 0 "The P gene of human parainfluenza virus type 1 encodes P and C proteins but not a 
cysteine-rich V protein." J Virol 1991;65:3406-3410. 



864. (Patatin) 

15 

This family consists of various patatin glycoproteins from plants. The patatin protein 
accounts for up to 40% of the total soluble protein in potato tubers [2]. Patatin is a storage 
protein but it also has the enzymatic activity of lipid acyl hydrolase, catalysing the cleavage 
of fatty acids from membrane lipids [2], 

20 

Number of members: 21 

[1] Banfalvi Z, Kostyal Z, Barta E; Medline: 95107249 "Solanum brevidens possesses a non- 
sucrose-inducible patatin gene." Mol Gen Genet 1994;245:517-522. 

2 5 [2] Mignery GA 5 Pikaard CS 5 Park WD; Medline: 88226014 "Molecular characterization of 

the patatin multigene family of potato." Gene 1988;62:27-44. 

865. (Pentapeptide_2) 

3 0 Pentapeptide repeats (8 copies) 

These repeats are found in many mycobacterial proteins. These repeats are most common in 
the PPE family of proteins, where they are found in the MPTR subfamily of PPE proteins. 
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The function of these repeats is unknown. The repeat can be approximately described as 
XNXGX, where X can be any amino acid. These repeats are similar to Pentapeptide [1], 
however it is not clear if these two families are structurally related. 

5 Number of members: 362 

[1] Bateman A, Murzin A, Teichmann SA; Medline: 98318059 "Structure and distribution of 
pentapeptide repeats in bacteria." Protein Sci 1998;7:1477-1480. 
10 [2] Cole ST, Brosch R, Parkhill J, Gamier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, 
Gas S, Barry CE 3rd, Tekaia F, Badcock K 5 Basham D, Brown D, Chillingworth T, Connor 
R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd S, Hornsby T, Jagels K, 
Barrell BG; Medline: 98295987 "Deciphering the biology of Mycobacterium tuberculosis 
from the complete genome sequence." Nature 1998;393:537-544. 

15 

866. (Peptidase_C13) 
Peptidase CI 3 family 

2 0 This family of peptidases is known as the hemoglobinase family because it contains a globin 
degrading enzyme from blood parasites Swiss:P42665. However relatives are found in plants 
and other organisms that have other functions. Members of this family are asparaginyl 
peptidases [1]. 

2 5 Number of members: 26 

[1] Chen JM, Dando PM, Rawlings ND ? Brown MA, Young NE 5 Stevens RA, Hewitt E, 
Watts C 5 Barrett AJ; Medline: 97218252 "Cloning, isolation, and characterization of 
mammalian legumain, an asparaginyl endopeptidase." J Biol Chem 1997;272:8090-8098. 

30 

867. (Pro_dh) 
Proline dehydrogenase 
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Number of members: 25 

[1] Ling M, Allen SW, Wood JM; Medline: 95055736 "Sequence analysis identifies the 
5 proline dehydrogenase and delta 1- pyrroline-5-carboxylate dehydrogenase domains of the 
multifunctional Escherichia coli PutA protein." J Mol Biol 1994;243:950-956. 

868. (PsbP) 

10 

This family consists of the 23 kDa subunit of oxygen evolving system of photosystem II or 
PsbP from various plants (where it is encoded by the nuclear genome) and Cyanobacteria. 
The 23 KDa PsbP protein is required for PSII to be fully operational in vivo, it increases the 
affinity of the water oxidation site for CI- and provides the conditions required for high 
1 5 affinity binding of Ca2+ [2]. 

Number of members: 25 

[1] Rova EM, Mc Ewen B 5 Fredriksson P0 5 Styring S; Medline: 97067138 "Photoactivation 
2 0 and photoinhibition are competing in a mutant of Chlamydomonas reinhardtii lacking the 23- 
kDa extrinsic subunit of photosystem II." J Biol Chem 1996;271:28918-28924. 
[2] Kochhar A 5 Khurana JP, Tyagi AK; Medline: 97191538 "Nucleotide sequence of the 
psbP gene encoding precursor of 23 -kDa polypeptide of oxygen-evolving complex in 
Arabidopsis thaliana and its expression in the wild-type and a constitutively 

2 5 photomorphogenic mutant." DNA Res 1 996;3 :277-285. 

869. (PUA) 

3 0 The PUA domain named after PseudoUridine synthase and Archaeosine transglycosylase, 

was detected in archaeal and eukaryotic pseudouridine synthases, archaeal archaeosine 
synthases, a family of predicted ATPases that may be involved in RNA modification, a 
family of predicted archaeal and bacterial rRNA methylases. Additionally, the PUA domain 
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was detected in a family of eukaryotic proteins that also contain a domain homologous to the 
translation initiation factor elFl/SUIl; these proteins may comprise a novel type of 
translation factors. Unexpectedly, the PUA domain was detected also in bacterial and yeast 
glutamate kinases; this is compatible with the demonstrated role of these enzymes in the 
5 regulation of the expression of other genes [1]. It is predicted that the PUA domain is an 
RNA binding domain. 

Number of members: 48 

10 [1] Aravind L 5 Koonin EV; Medline: 99193178 "Novel predicted RNA-binding domains 
associated with the translation machinery." J Mol Evol 1999;48:291-302. 

870. (RF1) 
1 5 eRF 1 -like proteins 

Members of this family are peptide chain release factors. The eukaryotic Release Factor 1 
proteins (eRF Is) are involved in termination of translation. The eRFl protein is functional for 
all stop codons and appears to abolish read-through of these codons. This family also 

2 0 includes other proteins for which the precise molecular function is unknown. Many of them 

are from Archaebacteria. These proteins may also be involved in translation termination but 
this awaits experimental verification. Number of members: 25 

[1] Frolova L ? Le Goff X, Rasmussen HH, Cheperegin S, Drugeon G, Kress M, Arman I, 
25 Haenni AL, Celis JE ? Philippe M, et al; Medline: 95082951 "A highly conserved eukaryotic 
protein family possessing properties of polypeptide chain release factor" [see comments] 
Nature 1994;372:701-703. 

[2] Drugeon G ? Jean-Jean O, Frolova L, Le Goff X, Philippe M, Kisselev L, Haenni AL; 
Medline: 97315314 "Eukaryotic release factor 1 (eRFl) abolishes readthrough and competes 

3 0 with suppressor tRNAs at all three termination codons in messenger RNA." Nucleic Acids 

Res 1997;25:2254-2258. 
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871. (RibosomalJL14e)Ribosomal protein L14 

This family includes the eukaryotic ribosomal protein LI 4. 
Number of members: 1 5 

5 

872. (RibosomaLS27) 
Ribosomal protein S27a 

This family of ribosomal proteins consists mainly of the 40S ribosomal protein S27a which is 
1 0 synthesized as a C-terminal extension of ubiquitin (CEP). The S27a domain compromises the 
C-terminal half of the protein. The synthesis of ribosomal proteins as extensions of ubiquitin 
promotes their incorporation into nascent ribosomes by a transient metabolic stabilization and 
is required for efficient ribosome biogenesis [3]. The ribosomal extension protein S27a 
contains a basic region that is proposed to form a zinc finger; its fusion gene is proposed as a 
1 5 mechanism to maintain a fixed ratio between ubiquitin necessary for degrading proteins and 
ribosomes a source of proteins [2], 

Number of members : 3 6 

20 

873. (Spermine synth) 
Spermine/spermidine synthase 

Spermine and spermidine are poly amines. This family includes spermidine synthase that 
2 5 catalyses the fifth (last) step in the biosynthesis of spermidine from arginine, and spermine 
synthase. 

Number of members: 39 

30 [1] Mezquita J, Pau M ? Mezquita C; Medline: 97449308 "Characterization and expression of 
two chicken cDNAs encoding ubiquitin fused to ribosomal proteins of 52 and 80 amino 
acids." Gene 1997;195:313-319. 
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[2] Redman KL, Rechsteiner M; Medline: 89181932 "Identification of the long ubiquitin 
extension as ribosomal protein S27a." Nature 1989;338:438-440. 

[3] Finley D, Bartel B ? Varshavsky A; Medline: 89181925 "The tails of ubiquitin precursors 
are ribosomal proteins whose fusion to ubiquitin facilitates ribosome biogenesis." Nature 
5 1989;338:394-401. 

874. (Surp) 
Surp module 

10 

[1] Denhez F, Lafyatis R; Medline: 94266805 "Conservation of regulated alternative splicing 
and identification of functional domains in vertebrate homologs to the Drosophila splicing 
regulator, suppressor-of- white-apricot." J Biol Chem 1994;269:16170-16179. 

15 This domain is also known as the SWAP domain. SWAP stands for Suppressor-of- White- 
APricot. It has been suggested that these domains may be RNA binding [1]. 

Number of members: 32 

20 

875. (TFIIE) 
TFIIE alpha subunit 

The general transcription factor TFIIE has an essential role in eukaryotic transcription 
25 initiation together with RNA polymerase II and other general factors. Human TFIIE consists 
of two subunits TFIIE-alpha Swiss:P29083 and TFIIE-beta Swiss:P29084 and joins the 
preinitiation complex after RNA polymerase II and TFIIF [1]. This family consists of the 
conserved amino terminal region of eukaryotic TFIIE-alpha [2] and proteins from 
archaebacteria that are presumed to be TFIIE-alpha subunits also Swiss:O29501 [3]. 

30 

Number of members : 1 2 



[1] Ohkuma Y, Sumimoto H, Hoffmann A, Shimasaki S, Horikoshi M, Roeder RG; Medline: 
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92065982 "Structural motifs and potential sigma homologies in the large subunit of human 
general transcription factor TFIIE." Nature 1991 ;354:398-401 . 

[2] Ohkuma Y, Hashimoto S ? Roeder RG 5 Horikoshi M; Medline: 93087200 Identification of 
two large subdomains in TFIIE-alpha on the basis of homology between Xenopus and human 
5 sequences. Nucleic Acids Res 1992;20:5838-5838. 

[3] Klenk HP, Clayton RA 5 Tomb JF ? White O, Nelson KE, Ketchum KA, Dodson RJ, Gwinn 
M, Hickey EK, Peterson JD, Richardson DL, Kerlavage AR, Graham DE, Kyrpides NC, 
Fleischmann RD, Quackenbush J, Lee NH, Sutton GG, Gill S, Kirkness EF, Dougherty BA, 
McKenney K, Adams MD } Loftus B, Venter JC, et al; Medline: 98049343 "The complete 
1 0 genome sequence of the hyperthermophilic, sulphate- reducing archaeon Archaeoglobus 
fulgidus." Nature 1997;390:364-370. 



876. (Transglutcore) 

15 

Cross-reference(s) PS00547; TRANSGLUTAMINASES 

Transglutaminases (EC 2.3.2.13) (TGase) [1,2] are calcium-dependent enzymes that catalyze 
the cross-linking of proteins by promoting the formation of isopeptide bonds between the 

2 0 gamma-carboxyl group of a glutamine in one polypeptide chain and the epsilon-amino group 
of a lysine in a second polypeptide chain. TGases also catalyze the conjugation of polyamines 
to proteins. The best known transglutaminase is blood coagulation factor XIII 5 a plasma 
tetrameric protein composed of two catalytic A subunits and two non-catalytic B subunits. 
Factor XIII is responsible for cross-linking fibrin chains, thus stabilizing the fibrin clot. Other 

2 5 forms of transglutaminases are widely distributed in various organs, tissues and body fluids. 
Sequence data is available for the following forms of TGase: 

- Transglutaminase K (Tgase K), a membrane-bound enzyme found in mammalian epidermis 
and important for the formation of the cornified cell envelope (gene TGM1). 

- Tissue transglutaminase (TGase C), a monomelic ubiquitous enzyme located in the 
30 cytoplasm (gene TGM2). 

- Transglutaminase 3, responsible for the later stages of cell envelope formation in the 
epidermis and the hair follicle (gene TGM3). 

- Transglutaminase 4 (gene TGM4). 
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A conserved cysteine is known to be involved in the catalytic mechanism of TGases. The 
erythrocyte membrane band 4.2 protein, which probably plays an important role in regulating 
the shape of erythrocytes and their mechanical properties, is evolutionary related to TGases. 
However the active site cysteine is substituted by an alanine and the 4.2 protein does not 
show TGase activity. 

Consensus pattem:[GT]-Q-[CA]-W^ 

NO:547)3-R-[CSA]- [LV]-G [The first C is the active site residue] Sequences known to 
belong to this class detected by the patternALL. Other sequence(s) detected in SWISS- 
PROTNONE. 

[ 1] Ichinose A., Bottenus R.E., Davie E.W. J. Biol. Chem. 265:1341 1-13414(1990). 
[ 2] Greenberg C.S. ? Birckbichler PJ., Rice R.H. FASEB J. 5:3071-3077(1991). 

877. (TruB^N) 

TruB family pseudouridylate synthase (N terminal domain) 

Members of this family are involved in modifying bases in RNA molecules. They carry out 
the conversion of uracil bases to pseudouridine. This family includes TruB, a pseudouridylate 
synthase that specifically converts uracil 55 to pseudouridine in most tRNAs. This family 
also includes Cbf5p that modifies rRNA [2]. 

Number of members : 3 3 

[1] Nurse K, Wrzesinski J, Bakin A, Lane BG 5 Ofengand J; Medline: 96079944 "Purification, 
cloning, and properties of the tRNA psi 55 synthase from Escherichia coli." RNA 
1995;1:102-112. 

[2] Lafontaine DLJ, Bousquet-Antonelli C, Henry Y, Caizergues-Ferrer M, Tollervey D; 
Medline: 98139521 "The box H + ACA snoRNAs carry Cbf5p, the putative rRNA 
pseudouridine synthase." Genes Dev 1998;12:527-537. 
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878. (UDPGP) 

UTP~glucose-l -phosphate uridyl yltransferase 

This family consists of UTP--glucose-l -phosphate uridylyltransferases, EC:2. 7.7.9. Also 
known as UDP-glucose pyrophosphorylase (UDPGP) and Glucose- 1 -phosphate 
uridylyltransferase. UTP«glucose-l -phosphate uridylyltransferase catalyses the 
interconversion of MgUTP + glucose- 1 -phosphate and UDP-glucose + MgPPi [1]. UDP- 
glucose is an important intermediate in mammalian carbohydrate interconversion involved in 
various metabolic roles depending on tissue type [1]. In Dictyostelium (slime mold) mutants 
in this enzyme abort the development cycle [2]. Also within the family is UDP-N- 
acetylglucosamine Swiss:Q16222 or AGX1 [3] and two hypothetical proteins from Borrelia 
burgdorferi the lyme disease spirochaete Swiss:051893 and Swiss:O51036. 

Number of members: 1 8 

[1] Duggleby RG, Chao YC, Huang JG, Peng HL, Chang HY; Medline: 96202932 "Sequence 
differences between human muscle and liver cDNAs for UDPglucose pyrophosphorylase and 
kinetic properties of the recombinant enzymes expressed in Escherichia coli." Eur J Biochem 
1996;235:173-179. 

[2] Ragheb J A, Dottin RP; Medline: 87231075 "Structure and sequence of a UDP glucose 
pyrophosphorylase gene of Dictyostelium discoideum." Nucleic Acids Res 1 987; 15:3891 - 
3906. 

[3] Mio T 5 Yabe T, Arisawa M, Yamada-Okabe H; Medline: 98269105 "The eukaryotic 
UDP-N-acetylglucosamine pyrophosphorylases. Gene cloning, protein expression, and 
catalytic mechanism. J Biol Chem 1998;273:14392-14397. 

879. (UPF004) 

Uncharacterized protein family UPF0044 signature 
Cross-reference(s) PS01301; UPF0044 
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The following uncharacterized proteins have been shown [1] to be highly similar: 

- Bacillus subtilis hypothetical protein yqel. 

-Escherichia coli hypothetical protein yhbY and HI1333, the corresponding Haemophilus 
influenzae protein. 

- Methanococcus jannaschii hypothetical protein MJ0652. 

These are small proteins of 10 to 15 Kd. They can be picked up in the database 
by the following pattern. This pattern is located in the N-terminal part of 
these proteins. 

Consensus pattern: L-[ST]-x(3)-K-x(3)-[KR]-[SGA]-x-[GA]-H-x-L-x-P-[LIV]-x(2)- [LIV]- 
[GA]-x(2)-G Sequences known to belong to this class detected by the patternALL. Other 
sequence(s) detected in SWISS-PROTNONE. 

880. (zf-A20) 
A20-like zinc finger 

A20- (an inhibitor of cell death)-like zinc fingers. The zinc 
finger mediates self-association in A20. These fingers also 
mediate IL-1 -induced NF-kappa B activation. 

Number of members: 22 

[1] Heyninck K, Beyaert R; Medline: 99126071 "The cytokine-inducible zinc finger protein 
A20 inhibits IL-1 -induced NF- kappaB activation at the level of TRAF6. FEBS Lett 
1999;442:147-150. 

[2] De Valck D, Heyninck K, Van Criekinge W, Contreras R,Beyaert R ? Fiers W; Medline: 
96390831 "A20, an inhibitor of cell death, self-associates by its 
zinc finger domain." FEBS Lett 1996;384:61-64. 

[3] Song HY, Rothe M, Goeddel DV; Medline: 96270609 "The tumor necrosis factor- 
inducible zinc finger protein A20 interacts with TRAF1/TRAF2 and inhibits NF-kappaB 
activation. Proc Natl Acad Sci U S A 1996;93:6721-6725. 
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[4] Opipari AW Jr, Boguski MS, Dixit VM; Medline: 90368626 "The A20 cDNA induced by 
tumor necrosis factor alpha encodes a novel type of zinc finger protein." J Biol Chem 
1990;265:14705-14708. 

5 

881.(zf-PARP) 

Poly(ADP-ribose) polymerase zinc finger domain 

Cross-reference(s) PS00347; P ARP_ZNFINGER_ 1 PS50064; PARP_ZN_FINGER_2 

10 

Poly(ADP-ribose) polymerase (EC 2.4.2.30) (PARP) [1,2] is a eukaryotic enzyme that 
catalyzes the covalent attachment of ADP-ribose units from NAD(+) to various nuclear 
acceptor proteins. This post-translational modification of nuclear proteins is dependent 
on DNA. It appears to be involved in the regulation of various important cellular 

1 5 processes such as differentiation, proliferation and tumor transformation as well as in the 
regulation of the molecular events involved in the recovery of the cell from DNA damage. 
Structurally, PARP, about 1000 amino-acids residues long, consists of three distinct 
domains: an N-terminal zinc-dependent DNA-binding domain, a central automodification 
domain and a C-terminal NAD-binding domain. The DNA-binding region contains a pair of 

2 0 zinc finger domains which have been shown to bind DNA in a zinc-dependent manner. The 
zinc finger domains of PARP seem to bind specifically to single-stranded DNA. DNA ligase 
III [3] contains, in its N-terminal section, a single copy of a zinc finger highly similar to 
those of PARP. 

2 5 Consensus pattern: C-[KR]-x-C-x(3)-I-x-K-x(3)-[RG]-x(l 6, 1 8)-W-[FYH]-H-x(2)-C [The 

three C's and the H are zinc ligands] Sequences known to belong to this class detected by the 
patternALL. Other sequence(s) detected in SWISS-PROTNONE. Sequences known to 
belong to this class detected by the profile ALL. Other sequence(s) detected in SWISS- 
PROTNONE. 

30 

Note: This documentation entry is linked to both signature patterns and a profile. As the 
profile is much more sensitive than the patterns, you should use it if you have access to the 
necessary software tools to do so. 
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[ 1] Althaus F.R., Richter C.R. Mol. Biol. Biochem. Biophys. 37:1-126(1987). 
[ 2] de Murcia G., Menissier de Murcia J. Trends Biochem. Sci. 19:172-176(1994). 
[ 3] Wei Y.-F., Robins P., Carter K. ? Caldecott K., Pappin D.J.C., Yu G.-L. ? Wang R.-P., 
5 Shell B.K., Nash R.A., Schar P., Barnes D.E., Haseltine W.A., Lindahl T. Mol. Cell. Biol. 
15:3206-3216(1995). 

882. Adenylylsulfate kinase (APSkinase) 

Enzyme that catalyses the phosphorylation of adenylylsulfate to 3'-phosphoadenylylsulfate. 
1 0 This domain contains an ATP binding P-loop motif. Number of members: 34 

[1] MacRae I J, Rose AB, Segel IH; Medline: 99003196 "Adenosine S'-phosphosulfate kinase 
from Penicillium chrysogenum. site- directed mutagenesis at putative phosphoryl-accepting 
and ATP P-loop residues. J Biol Chem 1998;273:28583-28589. 

15 

883. DNA polymerase family B signature DNAPOLYMERASEB (DNA_j>ol_B) 

Replicative DNA polymerases (EC 2.7.7.7) are the key enzymes catalyzing the 
accurate replication of DNA. They require either a small RNA molecule or a protein as a 
2 0 primer for the de novo synthesis of a DNA chain. On the basis of sequence similarity, a 
number of DNA polymerases have been grouped [1 to 7] under the designation of DNA 
polymerase family B. These are: 

- Higher eukaryotes polymerases alpha. 

- Higher eukaryotes polymerases delta. 

2 5 - Yeast polymerase I/alpha (gene POL1), polymerase II/epsilon (gene POL2), polymerase 
Ill/delta (gene POL3) and polymerase REV3. 

- Escherichia coli polymerase II (gene dinA or polB). 

- Archaebacterial polymerases. 

- Polymerases of viruses from the herpesviridae family. 
30 - Polymerases from Adenoviruses. 

- Polymerases from Baculoviruses. 

- Polymerases from Chlorella viruses. 

- Polymerases from Poxviruses. 
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- Bacteriophage T4 polymerase. 

- Podoviridae bacteriophages Phi-29, M2 and PZA polymerase. 

- Tectiviridae bacteriophage PRD1 polymerase. 

- Polymerases encoded on mitochondrial linear DNA plasmids in various fungi and plants 
(Kluyveromyces lactis pGKLl and pGKL2, Agaricus bitorquis pEM, Ascobolus immersus 
pAI2, Claviceps purpurea pCLKl, Neurospora Kalilo and Maranhar, maize S-l, etc). 

Six regions of similarity (numbered from I to VI) are found in all or a subset of the above 
polymerases. The most conserved region (I) includes a conserved tetrapeptide with two 
aspartate residues. Its function is not yet known. However, it has been suggested [3] that it 
may be involved in binding a magnesium ion. This conserved region was selected as a 
signature for this family of DNA polymerases. 

Consensus pattern [YA]-{;Gfe^ 

{^P^TO^LIVMFrC SEP ID NO:724)1-x4 WMSTA€j ]LIVMSTAC SEP ID NO: 15 1 Vj 
Sequences known to belong to this class detected by the patternALL, except for yeast 
polymerase II/epsilon, Agaricus bitorquis pEM and Sulfolobus solfataricus polymerase II. 

[ 1] Jung G., Leavitt M.C., Hsieh J.-C, Ito J. Proc. Natl. Acad. Sci. U.S.A. 84:8287- 
8291(1987). 

[ 2] Bernad A., Zaballos A., Salas M., Blanco L. EMBG J. 6:4219-4225(1987). 

[ 3] Argos P. Nucleic Acids Res. 16:9909-9916(1988). 

[ 4] Wang T.S.-F. 5 Wong S.W., Korn D. FASEB J. 3:14-21(1989). 

[ 5] Delarue M., Poch P., Todro N. ? Moras D., Argos P. Protein Eng. 3:461-467(1990). 

[ 6] Ito J., Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 

[ 7] Braithwaite D.K., Ito J. Nucleic Acids Res. 21 :787-802(1993). 

884. DNA polymerase family X signature - DNA PPLYMERASE X (DNA_polymeraseX) 

DNA polymerases (EC 2.7.7.7) can be classified, on the basis of sequence similarity [1], into 
at least four different groups: A, B, C and X. DNA polymerases that belong to family X are 
listed below [2]: 



Reference No. 2750-942P 



740 

- Vertebrate polymerase beta, involved in DNA repair. 

- Yeast polymerase IV (POL4) [3], an enzyme with similar characteristics to that of the 
mammalian polymerase beta. 

- Terminal deoxynucleotidyltransferase (TdT) (EC 2.7.7.31). TdT catalyzes the elongation of 
5 polydeoxynucleotide chains by terminal addition. One of the functions of this enzyme is the 

addition of nucleotides at the junction of rearranged Ig heavy chain and T cell receptor gene 
segments during the maturation of B and T cells. 

- African Swine Fever virus protein 0174L [4]. 

- Fission yeast hypothetical protein SpAC2F7.06c. 

10 

These enzymes are small (about 40 Kd) compared with other polymerases and their reaction 
mechanism operates via a distributive mode, i.e. they dissociate from the template-primer 
after addition of each nucleotide. 

15 As a signature pattern for this family of DNA polymerases, a highly conserved region that 
contains a conserved arginine and two conserved aspartic acid residues were selected. The 
latter together with the arginine have been shown [5] to be involved in primer binding in 
polymerase beta. 

2 0 Consensus pattern G-[SGHLFY]-x-R-[GE]-x(3)-^ 

WftfUVM SEP ID NP:4Vj-D- f L) VMFY1 [ I JVMFY SEP ID NO: 18^]GVx(2V[SAP] 
Sequences known to belong to this class detected by the patternALL. 

[ 1] Ito J., Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 
2 5 [2] Matsukage A., Nishikawa K., Goi T. ? Seto Y. ? Yamaguchi M. J. Biol. Chem. 262:8960- 
8962(1987). 

[ 3] Prasad R., Widen S.G., Singhal R.K., Watkins J. 5 Prakash L., Wilson S.H. Nucleic Acids 
Res. 21:5301-5307(1993). 

[ 4] Yanez R.J., Rodriguez J.M., Nogal M.L., Yuste L., Enriquez C. ? Rodriguez J.F., Vinuela 
30 E. Virology 208:249-278(1995). 

[ 5] Date T., Yamamoto S. ? Tanihara K., Nishimoto Y. 5 Matsukage A. Biochemistry 30:5286- 
5292(1991). 
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885. DUF14- Domain of unknown function 

This domain is found in glutamate synthase, tungsten formylmethanofuran dehydrogenase 
subunit c (FwdC) and molybdenum formylmethanofuran dehydrogenase subunit c (FmdC). 
It has no known function. Number of members: 52 

5 

[1] Hochheimer A, Hedderich R, Thauer RK; Medline: 99035764. "The formylmethanofuran 
dehydrogenase isoenzymes in Methanobacterium wolfei and Methanobacterium 
thermoautotrophicum: induction of the molybdenum isoenzyme by molybdate and 
constitutive synthesis of the tungsten isoenzyme." Arch Microbiol 1998;170:389-393. 

10 

886. DUF1 8-Domain of unknown function 

This domain of unknown function is found in several C. elegans proteins. The domain is 120 
amino acids long and rich in cysteine residues. There are 16 conserved cysteine positions in 
the domain. Number of members: 34 

15 

887. DUF27-Domain of unknown function 

This domain is found in a number of otherwise unrelated proteins. This domain is found at 
the C-terminus of the macro-H2A histone protein Swiss:Q02874. This domain is found in 
the non-structural proteins of several types of ssRNA viruses such as NSP2 from alphaviruses 
2 0 Swiss:P033 1 7. This domain is also found on its own in a family of proteins from bacteria 

Swiss:P75918 ? archaebacteria Swiss:059182 and eukaryotes Swiss:Q17432 5 suggesting that 
it is involved in an important and ubiquitous cellular process. Number of members: 66 

888. DUF37-Domain of unknown function 

2 5 This domain is found in short (70 amino acid) hypothetical proteins from various bacteria. 

The domain contains three conserved cysteine residues. Swiss:Q44066 from Aeromonas 
hydrophila has been found to have hemolytic activity (unpublished). Number of members: 
19 

3 0 889. EGF-like domain signatures. (EGF-like) 

A sequence of about thirty to forty amino-acid residues long found in the sequence of 
epidermal growth factor (EGF) has been shown [1 to 6] to be present, in a more or less 
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conserved form, in a large number of other, mostly animal proteins. The proteins currently 
known to contain one or more copies of an EGF-like pattern are listed below. 

- Adipocyte differentiation inhibitor (gene PREF-1) from mouse (6 copies). 

- Agrin, a basal lamina protein that causes the aggregation of acetylcholine receptors on 
5 cultured muscle fibers (4 copies). 

- Amphiregulin, a growth factor (1 copy). 

- Betacellulin, a growth factor (1 copy). 

- Blastula proteins BP 10 and Span from sea urchin which are thought to be involved in 
pattern formation (1 copy). 

10 - BM86, a glycoprotein antigen of cattle tick (7 copies). 

- Bone morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone 
formation and which expresses metalloendopeptidase activity (1-2 copies). Homologous 
proteins are found in sea urchin - suBMP (1 copy) - and in Drosophila - the dorsal-ventral 
patterning protein tolloid (2 copies). 

1 5 - Caenorhabditis elegans developmental proteins lin-12 (13 copies) and glp-1 (10 copies). 

- Caenorhabditis elegans APX-1 protein, a patterning protein (4.5 copies). 

- Calcium-dependent serine proteinase (CASP) which degrades the extracellular matrix 
proteins type I and IV collagen and fibronectin (1 copy). 

- Cartilage matrix protein CMP (1 copy). 

2 0 - Cartilage oligomeric matrix protein COMP (4 copies). 

- Cell surface antigen 1 14/A10 (3 copies). 

- Cell surface glycoprotein complex transmembrane subunit ASGP-2 from rat (2 copies). 

- Coagulation associated proteins C, Z (2 copies) and S (4 copies). 

- Coagulation factors VII, IX, X and XII (2 copies). 
25 - Complement Clr components (1 copy). 

- Complement Cls components (1 copy). 

- Complement-activating component of Ra-reactive factor (RARF) (1 copy). 

- Complement components C6, C7, C8 alpha and beta chains, and C9 (1 copy). 

- Crumbs, an epithelial development protein from Drosophila (29 copies). 
30 - Epidermal growth factor precursor (7-9 copies). 

- Exogastrula-inducing peptides A, C, D and X from sea urchin (1 copy). 

- Fat protein, a Drosophila cadherin-related tumor suppressor (5 copies). 
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- Fetal antigen 1, a probable neuroendocrine differentiation protein, which is derived from 
the delta-like protein (DLK) (6 copies). 

- Fibrillin 1 (47 copies) and fibrillin 2 (14 copies). 

- Fibropellins I A (21 copies), IB (13 copies), IC (8 copies), II (4 copies) and III (8 copies) 
from the apical lamina - a component of the extracellular matrix - of sea urchin. 

- Fibulin-1 and -2, two extracellular matrix proteins (9-1 1 copies). 

- Giant-lens protein (protein Argos), which regulates cell determination and axon guidance in 
the Drosophila eye (1 copy). 

- Growth factor-related proteins from various poxviruses (1 copy). 

- Gurken protein, a Drosophila developmental protein (1 copy). 

- Heparin-binding EGF-like growth factor (HB-EGF), transforming growth factor alpha 
(TGF-alpha), growth factors Lin-3 and Spitz (1 copy); the precursors are membrane proteins, 
the mature form is located extracellular. 

- Hepatocyte growth factor (HGF) activator (EC 3.4.21.-) (2 copies). 

- LDL and VLDL receptors, which bind and transport low-density lipoproteins and very low- 
density lipoproteins (3 copies). 

- LDL receptor-related protein (LRP), which may act as a receptor for endocytosis of 
extracellular ligands (22 copies). 

- Leucocyte antigen CD97 (3 copies), cell surface glycoprotein EMR1 (6 copies) and cell 
surface glycoprotein F4/80 (7 copies). 

- Limulus clotting factor C, which is involved in hemostasis and host defense mechanisms in 
japanese horseshoe crab (1 copy). 

- Meprin A alpha subunit, a mammalian membrane-bound endopeptidase (1 copy). 

- Milk fat globule-EGF factor 8 (MFG-E8) from mouse (2 copies). 

- Neuregulin GGF-I and GGF-II, two human glial growth factors (1 copy). 

- Neurexins from mammals (3 copies). 

- Neurogenic proteins Notch, Xotch and the human homolog Tan-1 (36 copies), Delta (9 
copies) and the similar differentiation proteins Lag-2 from Caenorhabditis elegans (2 copies), 
Serrate (14 copies) and Slit (7 copies) from Drosophila. 

- Nidogen (also called entactin), a basement membrane protein from chordates (2-6 copies). 

- Ookinete surface proteins (24 Kd, 25 Kd, 28 Kd) from Plasmodium (4 copies). 

- Pancreatic secretory granule membrane major glycoprotein GP2 (1 copy). 

- Perforin, which lyses non-specifically a variety of target cells (1 copy). 
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- Proteoglycans aggrecan (1 copy), versican (2 copies), perlecan (at least 2 copies), brevican 
(1 copy) and chondroitin sulfate proteoglycan (gene PG-M) (2 copies). 

- Prostaglandin G/H synthase 1 and 2 (EC 1.14.99.1) (1 copy), which is found in the 
endoplasmatic reticulum. 

- S 1-5, a human extracellular protein whose ultimate activity is probably modulated by the 
environment (5 copies). 

- Schwannoma-derived growth factor (SDGF), an autocrine growth factor as well as a 
mitogen for different target cells (1 copy). 

- Selectins. Cell adhesion proteins such as ELAM-1 (E-selectin), GMP-140 (P-selectin), or 
the lymph-node homing receptor (L-selectin) (1 copy). 

- Serine/threonine-protein kinase homolog (gene Pro25) from Arabidopsis thaliana, which 
may be involved in assembly or regulation of light-harvesting chlorophyll A/B protein (2 
copies). 

- Sperm-egg fusion proteins PH-30 alpha and beta from guinea pig (1 copy). 

- Stromal cell derived protein- 1 (SCP-1) from mouse (6 copies). 

- TDGF-1, human teratocarcinoma-derived growth factor 1 (1 copy). 

- Tenascin (or neuronectin), an extracellular matrix protein from mammals (14.5 copies), 
chicken (TEN- A) (13.5 copies) and the related proteins human tenascin-X (18 copies) and 
tenascin-like proteins TEN-A and TEN-M from Drosophila (8 copies). 

- Thrombomodulin (fetomodulin), which together with thrombin activates protein C (6 
copies). 

- Thrombospondin 1, 2 (3 copies), 3 and 4 (4 copies), adhesive glycoproteins that mediate 
cell-to-cell and cell-to-matrix interactions. 

- Thyroid peroxidase 1 and 2 (EC 1.1 1.1.8) from human (1 copy). 

- Transforming growth factor beta-1 binding protein (TGF-B1-BP) (16 or 18 copies). 

- Tyrosine-protein kinase receptors Tek and Tie (EC 2.7.1.1 12) (3 copies). 

- Urokinase-type plasminogen activator (EC 3.4.21.73) (UP A) and tissue plasminogen 
activator (EC 3.4.21.68) (TP A) (1 copy). 

- Uromodulin (Tamm-horsfall urinary glycoprotein) (THP) (3 copies). 

- Vitamin K-dependent anticoagulants protein C (2 copies) and protein S (4 copies) and the 
similar protein Z, a single-chain plasma glycoprotein of unknown function (2 copies). 

-63 Kd sperm flagellar membrane protein from sea urchin (3 copies). 

- 93 Kd protein (gene nel) from chicken (5 copies). 
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- Hypothetical 337.6 Kd protein T20G5.3 from Caenorhabditis elegans (44 copies). 

The functional significance of EGF domains in what appear to be unrelated proteins is not yet 
clear. However, a common feature is that these repeats are found in the extracellular domain 
of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin 
G/H synthase). The EGF domain includes six cysteine residues which have been shown (in 
EGF) to be involved in disulfide bonds. The main structure is a two-stranded beta-sheet 
followed by a loop to a C-terminal short two-stranded sheet. Subdomains between the 
conserved cysteines strongly vary in length as shown in the following schematic 
representation of the EGF-like domain: 



| x(4)-C-x(0,48)-C-x(3 5 12^ | 
| ************************************ 

+ + 

'C: conserved cysteine involved in a disulfide bond. 

'G 1 : often conserved glycine 

'a': often conserved aromatic amino acid 

'*': position of both patterns. 

V: any residue 

The region between the 5th and 6th cysteine contains two conserved glycines of which at 
least one is present in most EGF-like domains. Two patterns were created for this domain, 
each including one of these C-terminal conserved glycine residues. 

Consensus pattern: C-x-C-x(5)-G-x(2)-C [The 3 Os are involved in disulfide bonds] 
Sequences known to belong to this class detected by the pattern A majority, but not those that 
have very long or very short regions between the last 3 conserved cysteines of their EGF-like 
domain(s). Other sequence(s) detected in SWISS-PROT87 proteins, of which 27 can be 
considered as possible candidates. 

Consensus pattern: C-x-C-x(2)-[GP]-[FYW]-x(4,8)-C [The three C's are involved in disulfide 
bonds] Sequences known to belong to this class detected by the patternA majority, but not 
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those that have very long or very short regions between the last 3 conserved cysteines of their 
EGF-like domain(s). Other sequence(s) detected in SWISS-PROT83 proteins, of which 49 
can be considered as possible candidates. Note The beta chain of the integrin family of 
proteins contains 2 cysteine- rich repeats which were said to be dissimilar with the EGF 
pattern [7]. 

Note Laminin EGF-like repeats (see <PDOC00961>) are longer than the average EGF 
module and contain a further disulfide bond C-terminal of the EGF-like region. Perlecan and 
agrin contain both EGF-like domains and laminin-type EGF-like domains. Note the pattern 
do not detect all of the repeats of proteins with multiple EGF-like repeats. Note see 
<PDOC00913> for an entry describing specifically the subset of EGF- like domains that bind 
calcium. 

[ 1] Davis C.G. New Biol. 2:410-419(1990). 

[ 2] Blomquist M.C., Hunt L.T., Barker W.C. Proc. Natl. Acad. Sci. U.S.A. 81 :7363- 
7367(1984). 

[ 3] Barker W.C. 5 Johnson G.C., Hunt L.T., George D.G. Protein Nucl. Acid Enz. 29:54- 
68(1986). 

[ 4] Doolittle R.F., Feng D.F., Johnson M.S. Nature 307:558-560(1984). 

[ 5] Appella E. 5 Weber I.T., Blasi F. FEBS Lett. 23 1 : 1 -4(1 988). 

[ 6] Campbell I.D., Bork P. Curr. Opin. Struct. Biol. 3:385-392(1993). 

[ 7] Tamkun J.W., DeSimone D.W., Fonda D. ? Patel R.S., Buck C., Horwitz A.F., Hynes 

R.O. Cell 46:271-282(1986). 

890. Haml family (Hamlpjike) 

This family consists of the HAM1 protein Swiss:P471 19 and hypothetical archaeal bacterial 
and C. elegans proteins. HAM1 controls 6-N-hydroxylaminopurine (HAP) sensitivity and 
mutagenesis in S. cerevisiae Swiss:P471 19 [1]. The HAM1 protein protects the cell from 
HAP, either on the level of deoxynucleoside triphosphate or the DNA level by a yet 
unidentified set of reactions [1]. Number of members: 19 
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[1] Noskov VN, Staak K, Shcherbakova PV, Kozmin SG, Negishi K, Ono BC, Hayatsu H, 
Pavlov YI; Medline: 96381244 "HAM1, the gene controlling 6-N-hydroxylaminopurine 
sensitivity and mutagenesis in the yeast Saccharomyces cerevisiae." Yeast 1996;12:17-29. 

891. (HC03_cotransp) 

Anion exchange is a cellular transport function which contributes to the regulation of cell pH 
and volume. Anion exchangers are a family of functionally related proteins that contributes to 
these properties by maintaining the intracellular level of the two principal anions: chloride 
and HC03-. The best characterized anion exchanger is the band 3 protein [1], which is an 
erythrocyte anion exchange membrane glycoprotein. Band 3 is a protein of about 900 amino 
acids which consists of a cytoplasmic N-terminal domain of about 400 residues and an 
hydrophobic C-terminal section of about 500 residues that contains at least ten 
transmembrane regions. The cytoplasmic domain provides binding sites for cytoskeletal 
proteins, while the integral membrane domain is responsible for anion transport. Band 3 
protein is specific to erythroid cells, at least two other proteins [2] structurally and 
functionally related to band 3 5 are found in nonerythroid tissues: 

- AE2 (or B3 related protein; B3RP), a protein of 1200 residues, which seems to be present 
in a variety of cell types including lymphoid, kidney, and choroid plexus. 

- AE3, a protein of 1200 residues, which is specific to neurons. 

Structurally AE2 and AE3 are very similar to band 3, the main difference being an extension 
of some 300 residues of the N-terminal domain in AE2 and AE3. 

Two signature patterns were developed for these proteins. The first pattern is based on a 
conserved stretch of sequence that contains four clustered positive charged residues and 
which is located at the C-terminal extremity of the cytoplasmic domain, just before the first 
transmembrane segment from the integral domain. The second pattern is based on the 
perfectly conserved sequence of the fifth transmembrane segment; this segment contains a 
lysine, which is the covalent binding site for the isothiocyanate group of DIDS, an inhibitor 
of anion exchange. 

Consensus pattern F-G-G^WMlfXlVM S EP TP NO:4)] (2)^KR]-D-fI;4VM}r.LlVM SEP 
IO.NQ;4)j-[RK]-R-R-Y Sequences known to belong to this class detected by the pattern 
ALL. 
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Consensus pattern [FI]-L-I-S-L-I-F-I-Y-E-T-F-x-K-L Sequences known to belong to this 
class detected by the pattern ALL. 

[ 1] Jay D., Cantley L. Annu. Rev. Biochem. 55:511-538(1986). 
[ 2] Reithmeier R.A.F. Curr. Opin. Struct Biol. 3:515-523(1993). 

892. ATP phosphoribosyltransferase signature (HisG) 

ATP phosphoribosyltransferase (EC 2.4.2.17) is the enzyme that catalyzes the first step in the 
biosynthesis of histidine in bacteria, fungi and plants. It is a protein of about 23 to 32 Kd. As 
a signature pattern a region located in the C-terminal part of this enzyme was selected. 

Consensus pattern E-x(5)-G-x-[SAG]-x(2)-[IV]-x-D-[LIV]-x(2)-[ST]-G-x-T-[LM] 
Sequences known to belong to this class detected by the pattern ALL. 

893. HNH endonuclease (HNH) 
Number of members: 56 

[1] Shub DA, Goodrich-Blair H, Eddy SR; Medline: 951 17127 "Amino acid sequence motif 
of group I intron endonucleases is conserved in open reading frames of group II introns." 
Trends Biochem Sci 1994;19:402-404. 

[2] Dalgaard JZ, Klar AJ ? Moser MJ 5 Holley WR, Chatterjee A, Mian IS; Medline: 98026854 
"Statistical modeling and analysis of the LAGLIDADG family of site- specific endonucleases 
and identification of an intein that encodes a site-specific endonuclease of the HNH family." 
Nucleic Acids Res 1997;25:4626-4638. 

[3] Gorbalenya AE; Medline: 95004046 "Self-splicing group I and group II introns encode 
homologous (putative) DNA endonucleases of a new family." Protein Sci 1994;3:1 1 17-1 120. 

894. NEUROHYPOPHYSHORM (hormone5) 

Oxytocin (or ocytocin) and vasopressin [1] are small (nine amino acid residues), structurally 
and functionally related neurohypophysial peptide hormones. Oxytocin causes contraction of 
the smooth muscle of the uterus and of the mammary gland while vasopressin has a direct 
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antidiuretic action on the kidney and also causes vasoconstriction of the peripheral vessels. 
Like the majority of active peptides, both hormones are synthesized as larger protein 
precursors that are enzymatically converted to their mature forms. Peptides belonging to this 
family are also found in birds, fish, reptiles and amphibians (mesotocin, isotocin, valitocin, 
glumitocin, aspargtocin, vasotocin, seritocin, asvatocin, phasvatocin), in worms (annetocin), 
octopi (cephalotocin), locust (locupressin or neuropeptide F1/F2) and in molluscs 
(conopressins G and S) [2]. The pattern developed to detect this category of peptides spans 
their entire sequence and includes four invariant amino acid residues. 

Consensus pattern C-ffctf¥¥UFV SEP ID NO:.SS0^] m-v-N-[rs]-P-v-r T [The two C's are 
linked by a disulfide bond]. Sequences known to belong to this class detected by the pattern 
ALL. 

[ 1] Acher R., Chauvet J. Biochimie 70:1 197-1207(1988). 

[ 2] Chauvet J., Michel G., Ouedraogo Y., Chou J., Chait B.T., Acher R. Int. J. Pept. Protein 
Res. 45:482-487(1995). 



895. 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase (HPPK) 
All organisms require reduced folate cofactors for the synthesis of a variety of metabolites. 
Most microorganisms must synthesize folate de novo because they lack the active transport 
system of higher vertebrate cells which allows these organisms to use dietary folates. 
Enzymes involved in folate biosynthesis are therefore targets for a variety of antimicrobial 
agents such as trimethoprim or sulfonamides. 7,8-dihydro-6-hydroxymethylpterin- 
pyrophosphokinase (EC 2.7.6.3) (HPPK) catalyzes the attachment of pyrophosphate to 6- 
hydroxymethyl-7,8-dihydropterin to form 6-hydroxymethyl-7,8-dihydropteridine 
pyrophosphate. This is the first step in a three-step pathway leading to 7,8-dihydrofolate. 
Bacterial HPPK (gene folK or sulD) [1] is a protein of 160 to 270 amino acids. In the lower 
eukaryote Pneumocystis carinii, HPPK is the central domain of a multifunctional folate 
synthesis enzyme (gene fas) [2]. As a signature for HPPK, a conserved region located in the 
central section of these enzymes was selected. 
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Consensus pattern £KRHD}[0 

NQl727H-R^^ SEP TP NO:4^ (2) Sequences known to belong 

to this class detected by the pattern ALL. Other sequence(s) detected in SWISS- 
PROTNONE. 

[ 1] Talarico T.L., Ray P.H., Dev I.K., Merrill B.M., Dallas W.S. J. Bacteriol. 174:5971- 
5977(1992). 

[ 2] Volpes F., Dyer M., Scaife J.G., Darby G., Stammers D.K., Delves C.J. Gene 1 12:213- 
218(1992). 

896. Metalloenzyme superfamily (Metalloenzyme) 

This family includes phosphopentomutase Swiss:P07651 and 2,3-bisphosphoglycerate- 
independent phosphoglycerate mutase, Swiss:P37689. This family is also related to 
alk_phosphatase [1]. The alignment contains the most conserved residues that are probably 
involved in metal binding and catalysis. Number of members: 34 

[1] Galperin MY, Bairoch A, Koonin EV; Medline: 99180418 "A superfamily of 
metalloenzymes unifies phosphopentomutase and cofactor- independent phosphoglycerate 
mutase with alkaline phosphatases and sulfatases." Protein Sci 1998;7:1829-1835. 

897. Penicillin amidase (Penicilamidase) 

Penicillin amidase or penicillin acylase EC:3.5.1 .1 1 catalyses the hydrolysis of 
benzylpenicillin to phenylacetic acid and 6-aminopenicillanic acid (6-APA) a key 
intermediate in the the synthesis of penicillins [1], Also in the family is cephalosporin acylase 
Swiss:P07662 and Swiss:P29958 aculeacin A acylase which are involved in the synthesis of 
related peptide antibiotics. Number of members: 13 

[1] Verhaert RM, Riemens AM, van der Laan JM, van Duin J, Quax WJ; Medline: 97438505 
"Molecular cloning and analysis of the gene encoding the thermostable penicillin G acylase 
from Alcaligenes faecalis. Appl Environ Microbiol 1997;63:3412-3418. 
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[2] Duggleby HJ, Tolley SP, Hill CP, Dodson EJ, Dodson G, Moody PC; Medline: 951 15804 
"Penicillin acylase has a single-amino-acid catalytic centre." Nature 1995;373:264-268. 

898. Phosphoribosyl-AMP cyclohydrolase (PRA-CH) 

This enzyme catalyses the third step in the histidine biosynthetic pathway. It requires Zn ions 
for activity. Number of members: 13 

[1] D'Ordine RL, Klem TJ, Davisson VJ; Medline: 99129952 "Nl-(5 f - 
phosphoribosyl)adenosine-5'-monophosphate cyclohydrolase: purification and 
characterization of a unique metalloenzyme. Biochemistry 1999;38:1537-1546. 

899. Phosphoribosyl-ATP pyrophosphohydrolase (PRA-PH) 

This enzyme catalyses the second step in the histidine biosynthetic pathway. Number of 
members: 32 

[1] Keesey JK Jr 5 Bigelis R, Fink GR; Medline: 79216449 "The product of the his4 gene 
cluster in Saccharomyces cerevisiae. A trifunctional polypeptide." J Biol Chem 1979 Aug 
10;254:7427-7433. 

[2] Bruni CB, Carlomagno MS, Formisano S, Paolella G; Medline: 86310274 "Primary and 
secondary structural homologies between the HIS4 gene product of Saccharomyces 
cerevisiae and the hisIE and hisD gene products of Escherichia coli and Salmonella 
typhimurium." Mol Gen Genet 1986;203:389-396. 

900. Prokaryotic membrane lipoprotein lipid attachment site (PstS) 

In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, which 
is cleaved by a specific lipoprotein signal peptidase (signal peptidase II). The peptidase 
recognizes a conserved sequence and cuts upstream of a cysteine residue to which a 
glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see [1,2,3]): 
- Major outer membrane lipoprotein (murein-lipoproteins) (gene lpp). 
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- Escherichia coli lipoprotein-28 (gene nlpA). 

- Escherichia coli lipoprotein-34 (gene nlpB). 

- Escherichia coli lipoprotein nlpC. 

- Escherichia coli lipoprotein nlpD. 

5 - Escherichia coli osmotically inducible lipoprotein B (gene osmB). 

- Escherichia coli osmotically inducible lipoprotein E (gene osmE). 

- Escherichia coli peptidoglycan-associated lipoprotein (gene pal). 

- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

- Escherichia coli copper homeostasis protein cutF (or nlpE). 
10 - Escherichia coli plasmids traT proteins. 

- Escherichia coli Col plasmids lysis proteins. 

- A number of Bacillus beta-lactamases. 

- Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 

- Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 
15 - Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). 

- Chlamydia trachomatis outer membrane protein 3 (gene omp3). 

- Fibrobacter succinogenes endoglucanase cel-3. 

- Haemophilus influenzae proteins Pal and Pep. 

- Klebsiella pullulunase (gene pulA). 

2 0 - Klebsiella pullulunase secretion protein pulS. 

- Mycoplasma hyorhinis protein p37. 

- Mycoplasma hyorhinis variant surface antigens A 5 B, and C (genes vlpABC). 

- Neisseria outer membrane protein H.8. 

- Pseudomonas aeruginosa lipopeptide (gene lppL). 
2 5 - Pseudomonas solanacearum endoglucanase egl. 

- Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 

- Rickettsia 1 7 Kd antigen. 

- Shigella flexneri invasion plasmid proteins mxiJ and mxiM. 

- Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 
30 - Treponema pallidium 34 Kd antigen. 

- Treponema pallidium membrane protein A (gene tmpA). 

- Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence plasmid protein yscJ. 
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- Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper-binding 
protein. This is the first archaebacterial protein known to be modified in such a fashion). 
From the precursor sequences of all these proteins, a consensus pattern was derived and a set 
of rules to identify this type of post-translational modification. 

5 

Consensus pattern f PBR-K-f f PERK SFQ IP NO:354))(6V 

^P^MFW^AG ^UVMFWSTAG SbQ IP NO:352)] (2V 

ffettq^VS^ [C is the lipid 

attachment site] Additional rules: 1) The cysteine must be between positions 15 and 35 of the 
1 0 sequence in consideration. 2) There must be at least one Lys or one Arg in the first seven 
positions of the sequence. Sequences known to belong to this class detected by the 
patternALL. Other sequence(s) detected in SWISS-PROTsome 100 prokaryotic proteins. 
Some of them are not membrane lipoproteins, but at least half of them could be. 

15 [1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[ 2] Klein P., Somorjai R.L., Lau P.C.K. Protein Eng. 2:15-20(1988). 
[ 3] von Heijne G. Protein Eng. 2:531-534(1989). 

[ 4] Mattar S., Scharf B. ? Kent S.B.H., Rodewald K. 5 Oesterhelt P., Engelhard M. J. Biol. 
Chem. 269:14939-14945(1994). 

20 

901 . Ribosome recycling factor (RRF) 

The ribosome recycling factor (RRF / ribosome release factor) dissociates the ribosome from 
the mRNA after termination of translation, and is essential bacterial growth [1]. Thus 
2 5 ribosomes are "recycled" and ready for another round of protein synthesis. Number of 
members: 27 

[1] Janosi L, Shimizu I, Kaji A; Medline: 942401 15 "Ribosome recycling factor (ribosome 
releasing factor) is essential for bacterial growth." Proc Natl Acad Sci U S A 1994;91:4249- 
30 4253. 



902. S-layer homology(SLH) 
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S-layers are paracrystalline mono-layered assemblies of (glyco)proteins which coat the 
surface of bacteria [1], Several S-layer proteins and some other cell wall proteins contain one 
or more copies of a domain of about 50-60 residues, which has been called SLH (for S-layer 
homology) [2]. There is strong evidence that this domain serves as an anchor to the 
peptidoglycan [3]. The SLH domain has been found in: 

- S-layer glycoprotein of Acetogenium kivui (3 copies). 

- S-layer 125 Kd protein of Bacillus sphaericus (3 copies). 

- S-layer protein of Bacillus anthracis (3 copies). 

- S-layer protein of Bacillus licheniformis (3 copies). 

- S-layer protein (HWP) from Bacillus brevis strain HPD31 (3 copies). 

- Middle cell wall protein (MWP) from Bacillus brevis strain 47 (3 copies). 

- S-layer protein (pi 00) of Thermus thermophilus (1 copy). 

- Outer membrane protein Omp-alpha from Thermotoga maritima (1 copy). 

- Cellulosome anchoring protein (gene ancA), outer layer protein B (OlpB) and a further 
potential cell surface glycoprotein from Clostridium thermocellum (3 copies; the first copy is 
missing its N-terminal third which is appended to the end of the third copy; may have arisen 
by circular permutation). 

- Amylopullulanase (gene amyB) from Thermoanaerobacter thermosulfurogenes (3 copies) 

- Amylopullulanase (gene aapT) from Bacillus strain XAL-601 (3 copies). 

- Endoglucanase from Bacillus strain KSM-635 (3 copies). 

- Exoglucanase (gene xynX) from Clostridium thermocellum (3 copies). 

- Xylanase A (gene xynA) from Thermoanaerobacter saccharolyticum (2 copies; 3 copies if a 
frameshift is taken into account). 

- Protein involved in butirosin production (ButB) from Bacillus circulans (2 incomplete 
copies; 3 copies if three frameshifts are taken into account). 

- Two hypothetical proteins from Synechocystis strain PCC 6803 (1 copy each). 

- A hypothetical protein with sequence similarity to amylopullulanases found 3 f of amylase 
gene from Bacillus circulans (fragment of 1 copy; 3 copies if two frameshifts are taken into 
account). 

SLH domains are found at the N- or C-termini of mature proteins. They occur in single copy 
followed by a predicted coiled coil domain, or in three contiguous copies. Structurally, the 
SLH domain is predicted to contain two alpha-helices flanking a beta strand. The SLH 
sequences are fairly divergent with an average identity of about 25%. It is however possible 
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to build a sequence pattern that starts at the second position of the domain and that spans 3/4 
of its length. 

Consensus pattemH^YT^LVFYT SEP I D NQ:?28)]-x-[DA]-x(2.5)- 

5 f**NGSA3^ 

NO:730)j-x(4)-[LIV]-x(2)- feTAfe¥jfGTALV SEP ID NQ:731)]-x(4,6)- 
ffctVF¥Gi rLI.VFYC SEP I D N 0 : 73 2 VI -xf 2)-G-x- f PGS¥A-} j'PG ST A SEP ID NO:?33)] - 
x (23)4ivff¥A|(MFYA.M.CL.lD.NO iPGAV][PGAV JE^ 0)- 

tl- WMA}[ J J Y M A „^ [R Y] -x- [EQ] -x- 

1 0 j' ST ALIV M -j- f STAlJ VM SEP IDNO:736Yj Sequences known to belong to this class detected 
by the pattern ALL. Gther sequence(s) detected in SWISS-PRGTNGNE. 

[ 1] Beveridge T.J. Curr. Ppin. Struct. Biol. 4:204-212(1994). 

[ 2] Lupas A., Engelhardt FL, Peters J., Santarius U., Volker S., Baumeister W. J. Bacteriol. 
15 176:1224-1233(1994). 

[ 3] Lemaire M. 5 Phayon H. ? Gounon P., Fujino T., Beguin P. J. Bacteriol. 177:2451- 
2459(1995). 

20 903. Queuine tRNA-ribosyltransferase (TGT) 

This is a family of queuine tRNA-ribosyltransferases EC:2.4.2.29 ? also known as tRNA- 
guanine transglycosylase and guanine insertion enzyme. Queuine tRNA-ribosyltransferase 
modifies tRNAs for asparagine, aspartic acid 5 histidine and tyrosine with queuine. It catalyses 
the exchange of guanine-34 at the wobble position with 7-aminomethyl-7-deazaguanine, and 

2 5 the addition of a cyclopentenediol moiety to 7-aminomethyl-7-deazaguanine-34 tRNA; 

giving a hypermodified base queuine in the wobble position [1,2]. The aligned region contains 
a zinc binding motif C-x-C-x2-C-x29-H ? and important tRNA and 7-aminomethyl- 
7deazaguanine binding residues [1]. Number of members: 27 

30 [1] Romier C, Reuter K ? Suck D, Ficner R; Medline: 96256303 "Crystal structure of tRNA- 
guanine transglycosylase: RNA modification by base exchange." EMBG J 1 996; 15:2850- 
2857. 
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[2] Garcia GA, Koch KA, Chong S; Medline: 93287116 "tRNA-guanine transglycosylase 
from Escherichia coli. Overexpression, purification and quaternary structure." J Mol Biol 
1993;231:489-497. 

5 

904. ThiC Family (ThiC) 

ThiC is found within the thiamine biosynthesis operon. ThiC is involved in pyrimidine 
biosynthesis [2]. ThiC catalyzes the substitution of the pyrophosphate of 2-methyl-4-amino- 
5-hydroxymethylpyrimidine pyrophosphate by 4-methyl-5-(beta-hydroxyethyl)thiazole 
1 0 phosphate to yield thiamine phosphate [3]. Number of members: 12 

[1] Vander Horn PB, Backstrom AD 5 Stewart V, Begley TP; Medline: 93163063 "Structural 
genes for thiamine biosynthetic enzymes (thiCEFGH) in Escherichia coli K-12." J Bacteriol 
1993;175:982-992. 

15 [2] Begley TP, Downs DM, Ealick SE, McLafferty FW, Van Loon AP, Taylor S, 

Campobasso N, Chiu HJ, Kinsland C, Reddick JJ, Xi J; Medline: 9931 1269 "Thiamin 
biosynthesis in prokaryotes." Arch Microbiol 1999;171:293-300. 

[3] Zhang Y, Taylor SV, Chiu HJ, Begley TP; Medline: 97284509 "Characterization of the 
Bacillus subtilis thiC operon involved in thiamine biosynthesis." J Bacteriol 1997; 179:3030- 
20 3035. 

905. Putative tRNA binding domain (tRNAJrind) 

This domain is found in prokaryotic methionyl-tRNA synthetases, prokaryotic phenylalanyl 

2 5 tRNA synthetases the yeast GU4 nucleic-binding protein (G4pl or p42, ARC1) [2], human 

tyrosyl-tRNA synthetase [1], and endothelial-monocyte activating polypeptide II. G4pl binds 
specifically to tRNA form a complex with methionyl-tRNA synthetases [2]. In human 
tyrosyl-tRNA synthetase this domain may direct tRNA to the active site of the enzyme [2]. 
This domain may perform a 

3 0 common function in tRNA aminoacylation [1]. Number of members: 12 
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[1] Kleeman TA, Wei D, Simpson KL, First EA; Medline: 97306356 "Human tyrosyl-tRNA 
synthetase shares amino acid sequence homology with a putative cytokine." J Biol Chem 
1997;272:14420-14425. 

[2] Simos G, Segref A, Fasiolo F 5 Hellmuth K, Shevchenko A, Mann M, Hurt EC; Medline: 
97050848 "The yeast protein Arclp binds to tRNA and functions as a cofactor for the 
methionyl-and glutamyl-tRNA synthetases." EMBO J 1996;15:5437-5448. 

906. UbiA prenyltransferase family signature (UbiA) 

The following prenyltransferases are evolutionary related [1,2]: 

- Bacterial 4-hydroxybenzoate octaprenyltransferase (gene ubiA). 

- Yeast mitochondrial para-hydroxybenzoate-polyprenyltransferase (gene COQ2). 

- Protoheme IX farnesyltransferase (heme O synthase) from yeast and mammals (gene 
COX10) and from bacteria (genes cyoE or ctaB). 

These proteins probably contain seven transmembrane segments. The best conserved region 
is located in a loop between the second and third of these segments and was used as a 
signature pattern. 

Consensus pattern N-x(3)-[DE]-x(2)-[LIF]-D-x(2)-[VM]-x-R-[ST]-x(2)-R-x(4)-G Sequences 
known to belong to this class detected by the pattern ALL. Other sequence(s) detected in 
SWISS-PROTNONE. 

[ 1] Melzer M., Heide L. Biochim. Biophys. Acta 1212:93-102(1994). 
[ 2] Mogi T. 5 Saiki K. 5 Anraku Y. Mol. Microbiol. 14:391-398(1994). 

907. Uncharacterized protein family UPF0044 signature (UPF0044) 

The following uncharacterized proteins have been shown [1] to be highly similar: 

- Bacillus subtilis hypothetical protein yqel. 

-Escherichia coli hypothetical protein yhbY and HI 133 3, the corresponding Haemophilus 
influenzae protein. 

- Methanococcus jannaschii hypothetical protein MJ0652. 
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These are small proteins of 10 to 15 Kd. They can be picked up in the database by the 
following pattern. This pattern is located in the N-terminal part of these proteins. 

Consensus pattern L-[ST]-x(3)-K-x(3)-[KR]-[SGA]-x-[GA]-H-x-L-x-P-[LIV]-x(2)- [LIV]- 
5 [GA]-x(2)-G Sequences known to belong to this class detected by the patternALL. 

908. ATP synthase (C/AC39) subunit (vATP-synt_AC39) 

This family includes the AC39 subunit from vacuolar ATP synthase Swiss:P32366 [1], and 
10 the C subunit from archaebacterial ATP synthase [2]. The family also includes subunit C 
from the Sodium transporting ATP synthase from Enterococcus hirae Swiss:P43456 [3]. 
Number of members: 12 

[1] Bauerle C ? Ho MN, LinJorfer MA, Stevens TH; Medline: 932861 19 "The Saccharomyces 
1 5 cerevisiae VMA6 gene encodes the 36-kDa subunit of the vacuolar H(+)- ATPase membrane 
sector." J Biol Chem 1993;268:12749-12757. 

[2] Wilms R, Freiberg C, Wegerle E 5 Meier I, Mayer F, Muller V; Medline: 96324968 
"Subunit structure and organization of the genes of the Al AO ATPase from the Archaeon 
Methanosarcina mazei Gol J Biol Chem 1996;271:18843-18852. 
2 0 [3] Takase K, Kakinuma S 3 Yamato I, Konishi K 5 Igarashi K, Kakinuma Y; Medline: 

94209269 "Sequencing and characterization of the ntp gene cluster for vacuolar- type Na(+)- 
translocating ATPase of Enterococcus hirae." J Biol Chem 1994;269:1 1037-1 1044. 

25 909. ATP synthase (E/31 kDa) subunit (vATP-synt_E) 

This family includes the vacuolar ATP synthase E subunit [1], as well as the archaebacterial 
ATP synthase E subunit [2]. Number of members: 24 

[1] Foury F; Medline: 91009356 "The 31-kDa polypeptide is an essential subunit of the 
30 vacuolar ATPase in Saccharomyces cerevisiae." J Biol Chem 1990;265:18554-18560. 
[2] Wilms R, Freiberg C ? Wegerle E 5 Meier I, Mayer F, Muller V; Medline: 96324968 
"Subunit structure and organization of the genes of the Al AO ATPase from the Archaeon 
Methanosarcina mazei Gol." J Biol Chem 1996;271:18843-18852. 
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910. (WW) 

The WW domain [1-4,E1] (also known as rsp5 or WWP) has been originally discovered as a 
short conserved region in a number of unrelated proteins, among them dystrophin, the gene 
responsible for Duchenne muscular dystrophy. The domain, which spans about 35 residues, is 
repeated up to 4 times in some proteins. It has been shown [5] to bind proteins with particular 
proline- motifs, [AP]-P-P-[AP]-Y, and thus resembles somewhat SH3 domains. It appears to 
contain beta-strands grouped around four conserved aromatic positions; generally Trp. The 
name WW or WWP derives from the presence of these Trp as well as that of a conserved Pro. 
It is frequently associated with other domains typical for proteins in signal transduction 
processes. 

Proteins containing the WW domain are listed below. 

- Dystrophin, a multidomain cytoskeletal protein. Its longest alternatively spliced form 
consists of an N-terminal actin-binding domain, followed by 24 spectrin-like repeats, a 
cysteine-rich calcium-binding domain and a C- terminal globular domain. Dystrophin form 
tetramers and is thought to have multiple functions including involvement in membrane 
stability, transduction of contractile forces to the extracellular environment and organization 
of membrane specialization. Mutations in the dystrophin gene lead to muscular dystrophy of 
Duchenne or Becker type. Dystrophin contains one WW domain C-terminal of the spectrin- 
repeats. 

- Utrophin, a dystrophin-like protein of unknown function. 

- Vertebrate YAP protein is a substrate of an unknown serine kinase. It binds to the SH3 
domain of the Yes oncoprotein via a proline-rich region. This protein appears in alternatively 
spliced isoforms, containing either one or two WW domains [6]. 

- Mouse NEDD-4 plays a role in the embryonic development and differentiation of the 
central nervous system. It contains 3 WW modules followed by a HECT domain. The human 
ortholog contains 4 WW domains, but the third WW domain is probably spliced resulting in 
an alternate NEDD-4 protein with only 3 WW modules [3]. 

- Yeast RSP5 is similar to NEDD-4 in its molecular organization. It contains an N-terminal 
C2 domain (see <PDOC00380>, followed by a histidine-rich region, 3 WW domains and a 
HECT domain. 
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- Rat FE65 5 a transcription-factor activator expressed preferentially in liver. The activator 
domain is located within the N-terminal 232 residues of FE65, which also contain the WW 
domain. 

- Yeast ESS1/PTF1, a putative peptidyl prolyl cis-trans isomerase from family ppiC (see 
5 <PDOC00840>). A related protein, dodo (gene dod) exists in Drosophila and in mammals 

(gene PIN1). 

- Tobacco DB10 protein. The WW domain is located N-terminal to the region with similarity 
to ATP-dependent RNA helicases. 

- IQGAP, a human GTPase activating protein acting on ras. It contains an N- terminal 

1 0 domain similar to fly muscle mp20 protein and a C-terminal ras GTPase activator domain. 

- Yeast pre-mRNA processing protein PRP40, Caenorhabditis elegans ZK1 098.1 and fission 
yeast SpAC13C5.02 are related proteins with similarity to MY02- type myosin, each 
containing two WW-domains at the N-terminus. 

- Caenorhabditis elegans hypothetical protein C38D4.5, which contains one WW module, a 
15 PH domain (see <PDOC50003>) and a C-terminal phosphatidylinositol 3-kinase domain. 

- Yeast hypothetical protein YFLOlOc. 

For the sensitive detection of WW domains, a profile was developed which spans the whole 
homology region as well as a pattern. 

2 0 Consensus pattern W-x(9, 1 1 )-[VFY]-[F YW]-x(6,7)-[GSTNE] {GSTWE .MQiDNO: 737 j]- 
fGSTQCR.^ f GSTPCR SEP ID NO:738 )HFYW]-x(2VP Sequences known to belong to this 
class detected by the pattern ALL. Other sequence(s) detected in SWISS-PROT8. Sequences 
known to belong to this class detected by the profileALL. 

25 [1] Bork P., Sudol M Trends Biochem. Sci. 19:531-533(1994). 

[ 2] Andre B., Springael J.Y. Biochem. Biophys. Res. Commun. 205:1201-1205(1994). 
[ 3] Hofmann K.O., Bucher P. FEBS Lett. 358:153-157(1995). 

[ 4] Sudol M. ? Chen H.I., Bougeret C, Einbond A., Bork P. FEBS Lett. 369:67-71(1995). 
[ 5] Chen H.I., Sudol M. Proc. Natl. Acad. Sci. U.S.A. 92:7819-7823(1995). 
30 [6] Sudol M., Bork P., Einbond A., Kastury K. ? Druck T., Negrini M., Huebner K., Lehman 
D. J. Biol. Chem. 270:14733-14741(1995). 
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91 1. Xeroderma pigmentosum (XP) [1] (XPG_1) 

Xeroderma pigmentosum (XP) [1] is a human autosomal recessive disease, characterized by a 
high incidence of sunlight-induced skin cancer. People's skin cells with this condition are 
hypersensitive to ultraviolet light, due to defects in the incision step of DNA excision repair. 
5 There are a minimum of seven genetic complementation groups involved in this pathway: 

XP-A to XP-G. The defect in XP-G can be corrected by a 133 Kd nuclear protein called XPG 
(orXPGC) [2]. 

XPG belongs to a family of proteins [2,3,4,5,6] that are composed of two main subsets: 
10 - Subset 1, to which belongs XPG, RAD2 from budding yeast and radl3 from fission yeast. 
RAD2 and XPG are single-stranded DNA endonucleases [7,8]. XPG makes the 3'incision in 
human DNA nucleotide excision repair [9]. 

- Subset 2, to which belongs mouse and human FEN-1, rad2 from fission yeast, and RAD27 
from budding yeast. FEN-1 is a structure-specific endonuc lease. 

15 

In addition to the proteins listed in the above groups, this family also includes: 

- Fission yeast exol, a 5'->3 f double-stranded DNA exonuclease that could act in a pathway 
that corrects mismatched base pairs. 

- Yeast EXOl (DHS1), a protein with probably the same function as exol. 
20 - Yeast DIN7. 

Sequence alignment of this family of proteins reveals that similarities are largely confined to 
two regions. The first is located at the N-terminal extremity (N-region) and corresponds to 
the first 95 to 105 amino acids. The second region is internal (I-region) and found towards the 
2 5 C -terminus; it spans about 140 residues and contains a highly conserved core of 27 amino 
acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). It is possible that the 
conserved acidic residues are involved in the catalytic mechanism of DNA excision repair in 
XPG. The amino acids linking the N- and I-regions are not conserved; indeed, they are 
largely absent from proteins belonging to the second subset. 

30 

Two signature patterns were developed for these proteins. The first corresponds to the central 
part of the N-region, the second to part of the I-region and includes the putative catalytic core 
pentapeptide. 
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Consensus pattern rVIHKREI-P-x-fmk HFYIL SEP ID NO:644)] -V-F-D-G-x(2V[PIL]-x- 
[LVC]-K Sequences known to belong to this class detected by the patternALL. Other 
sequence(s) detected in SWISS-PROTNONE. 

5 

Consensus pattern fGSI-fWMflLIVM S EP ID N 0 :4)}- rPER] - [F YS ] -{^IVM} | LI V M. SEP 
fD NO:4 ? l -x-A-P-x-E-A-[DE]-[PAS]- [QS]-[CLM] Sequences known to belong to this class 
detected by the patternALL. Other sequence(s) detected in SWISS-PROTNONE. 

10 [1] Tanaka K., Wood R.D. Trends Biochem. Sci. 19:83-86(1994). 

[ 2] Scherly D., Nouspikel T., Corlet J., Ucla C, Bairoch A., Clarkson S.G. Nature 363:182- 
185(1993). 

[ 3] Carr A.M., Sheldrick K.S., Murray J.M., Al-Harithy R., Watts F.Z., Lehmann A.R. 

Nucleic Acids Res. 21:1345-1349(1993). 
15 [4] Murray J.M., Tavassoli M., Al-Harithy R., Sheldrick K.S., Lehmann A.R., Carr A.M., 

Watts F.Z. Mol. Cell. Biol. 14:4878-4888(1994). 

[ 5] Harrington J.J., Lieber M.R. Genes Dev. 8:1344-1355(1994). 

[ 6] Szankasi P., Smith G.R. Science 267:1 166-1 169(1995). 

[ 7] Habraken Y., Sung P., Prakash L., Prakash S. Nature 366:365-368(1993). 
20 [8] O'Donovan A., Scherly D., Clarkson S.G., Wood R.D. J. Biol. Chem. 269:15965- 

15968(1994). 

[ 9] O'Donovan A., Davies A.A., Moggs J.G., West S.C., Wood R.D. Nature 371 :432- 
435(1994). 

25 

912. 5-formyltetrahydro folate cyclo-ligase (5-FTHF_cyc-lig) 

5-formyltetrahydrofolate cyclo-ligase or methenyl-THF synthetase EC:6.3.3.2 catalyses the 
interchange of 5-formyltetrahydrofolate (5-FTHF) to 5-10-methenyltetrahydrofolate, this 
30 requires ATP and Mg2+ [1]. 5-FTHF is used in chemotherapy where it is clinically known as 
Leucovorin [2]. 
Number of members: 23 
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[1] Dayan A, Bertrand R, Beauchemin M, Chahla D, Mamo A 5 Filion M 5 Skup D, Massie B 5 
Jolivet J; Medline: 96096540 "Cloning and characterization of the human 5,10- 
methenyltetrahydrofolate synthetase-encoding cDNA." Gene 1995;165:307-311. 
[2] Maras B ? Stover P, Valiante S, Barra D, Schirch V; Medline: 94308074 "Primary 
5 structure and tetrahydropteroylglutamate binding site of rabbit liver cytosolic 5,10- 
methenyltetrahydrofolate synthetase." J Biol Chem 1994;269:18429-18433. 

913. Cytosolic long-chain acyl-CoA thioester hydrolase (Acyl-CoA hydro) 

1 0 This family consist of various cytosolic long-chain acyl-CoA thioester hydrolases including 
human and rat [1,2]. The aligned region is repeated with in the sequence of human and rat 
cytosolic long-chain acyl-CoA thioester hydrolases of this family. Long-chain acyl-CoA 
hydrolases hydrolyse palmitoyl-CoA to CoA and palmitate, they also catalyse the hydrolysis 
of other long chain fatty acyl-CoA thioesters. Long-chain acyl-CoA hydrolases are present in 

15 all living organisms and they may provide a mechanism for the control of lipid metabolism 
[!]• 

Number of members: 24 

[l]Yamada J, Furihata T, Iida N, Watanabe T, Hosokawa M, Satoh T 9 Someya A, Nagaoka I 5 
2 0 Suga T; Medline: 97236308 "Molecular cloning and expression of cDNAs encoding rat brain 
and liver cytosolic long-chain acyl-CoA hydrolases." Biochem Biophys Res Commun 
1997;232:198-203. 

[2] Broustas CG, Larkins LK, Uhler MD, Hajra AK; Medline: 96209964 "Molecular cloning 
and expression of cDNA encoding rat brain cytosolic acyl-coenzyme A thioester hydrolase." 
2 5 J Biol Chem 1996;271:10470-10476. 

914. Agglutinin 

Lectin (probable mannose binding) 

30 Members of this family are plant lectins. Many if not all are mannose specific. 
Number of members: 87 



Reference No. 2750-942P 



764 

[1] Wright CS 9 Hester G; Medline: 97094989 "The 2.0 A structure of a cross-linked complex 
between snowdrop lectin and a branched mannopentaose: evidence for two unique binding 
modes.' 1 Structure 1996;4:1339-1352. 

915. (ANF_RECEPTORS) 

Natriuretic peptides are hormones involved in the regulation of fluid and electrolyte 
homeostasis. These hormones stimulate the intracellular production of cyclic GMP as a 
second messenger. 

Currently, three types of natriuretic peptide receptors are known [1,2]. Two express guanylate 
cyclase activity: GC-A (or ANP-A) which seems specific to atrial natriuretic peptide (ANP), 
and GC-B (or ANP-B) which seems to be stimulated more effectively by brain natriuretic 
peptide (BNP) than by ANP. The third receptor (ANP-C) is probably responsible for the 
clearance of ANP from the circulation and does not play a role in signal transduction. 

GC-A and GC-B are plasma membrane-bound proteins that share the following topology: an 
N-terminal extracellular domain which acts as the ligand binding region, then a 
transmembrane domain followed by a large cytoplasmic C- terminal region that can be 
subdivided into two domains: a protein kinase-like domain (see <PDOC00100>) that appears 
important for proper signalling and a guanylate cyclase catalytic domain (see 
<PDOC00425>). The topology of ANP-C is different: like GC-A and -B it possesses an 
extracellular ligand-binding region and a transmembrane domain, but its cytoplasmic domain 
is very short. 

A pattern was developed from the ligand-binding region of natriuretic peptide receptors based 
on a highly conserved region located in the N-terminal part of the domain. 

Consensus patternG-P-x-C-x-Y-x-A-A-x-V-x-R-x(3)-H-W Sequences known to belong to 
this class detected by the patternALL. Other sequence(s) detected in SWISS-PROTNONE. 

[ 1] Garbers D.L. New Biol. 2:499-504(1990). 

[ 2] Schulz S., Chinkers M., Garbers D.L. FASEB J. 2:2026-2035(1989). 
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916. (Apocytochrome) 

Cytochrome c family heme-binding site signature 

In proteins belonging to cytochrome c family [1], the heme group is covalently attached by 
thioether bonds to two conserved cysteine residues. The consensus sequence for this site is 
Cys-X-X-Cys-His and the histidine residue is one of the two axial ligands of the heme iron. 
This arrangement is shared by all proteins known to belong to cytochrome c family, which 
presently includes cytochromes c 5 c\ cl to c6, c550 to c556, cc3/Hmc, cytochrome f and 
reaction center cytochrome c. 

Consensus pattemC^-GPWW^^ 

^LiHU-C^^ Sequences known to belong to this 

class detected by the patternALL, except for four cytochrome c's which lack the first 
thioether bond. Other sequence(s) detected in SWISS-PROT454. 

Note: some cytochrome c's have more than a single bound heme groupc4 has 2, c7 has 3, c3 
has 4, the reaction center has 4, and cc3/Hmc has 16 ! 

[ 1] Mathews F.S. Prog. Biophys. Mol. Biol. 45:1-56(1985). 

917. ATP-synt_A-c. ATP synthase Alpha chain, C terminal 

[1] Medline: 94344236. Structure at 2.8 A resolution of Fl-ATPase from bovine heart 
mitochondria. Abrahams JP 5 Leslie AG, Lutter R, Walker JE; Nature 1994;370:621-628. 
Number of members: 125 

918. (Basic) 

Myc-type, 'helix-loop-helix 1 dimerization domain signature 
HELIX_LOOP_HELIX 

A number of eukaryotic proteins, which probably are sequence specific DNA- binding 
proteins that act as transcription factors, share a conserved domain of 40 to 50 amino acid 
residues. It has been proposed [1] that this domain is formed of two amphipathic helices 
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joined by a variable length linker region that could form a loop. This 'helix-loop-helix' (HLH) 
domain mediates protein dimerization and has been found in the proteins listed below 
[2 9 3,E1,E2]. Most of these proteins have an extra basic region of about 15 amino acid 
residues that is adjacent to the HLH domain and specifically binds to DNA. They are refered 
5 as basic helix-loop-helix proteins (bHLH), and are classified in two groups: class A 

(ubiquitous) and class B (tissue-specific). Members of the bHLH family bind variations on 
the core sequence 'CANNTG', also refered to as the E-box motif. The homo- or 
heterodimerization mediated by the HLH domain is independent of, but necessary for DNA 
binding, as two basic regions are required for DNA binding activity. The HLH proteins 
1 0 lacking the basic domain (Emc, Id) function as negative regulators since they form 

heterodimers, but fail to bind DNA. The hairy-related proteins (hairy, E(spl), deadpan) also 
repress transcription although they can bind DNA. The proteins of this subfamily act together 
with co-repressor proteins, like groucho, through their C-terminal motif WRPW. 

- The myc family of cellular oncogenes [4] 5 which is currently known to contain four 

1 5 members: c-myc [E3], N-myc, L-myc, and B-myc. The myc genes are thought to play a role 
in cellular differentiation and proliferation. 

- Proteins involved in myogenesis (the induction of muscle cells). In mammals MyoDl 
(Myf-3) 5 myogenin (Myf-4), Myf-5, and Myf-6 (Mrf4 or herculin), in birds CMD1 (QMF-1), 
in Xenopus MyoD and MF25, in Caenorhabditis elegans CeMyoD, and in Drosophila 

2 0 nautilus (nau). 

- Vertebrate proteins that bind specific DNA sequences ('E boxes') in various 
immunoglobulin chains enhancers: E2A or ITF-1 (E12/pan-2 and E47/pan-l), ITF-2 (tcf4), 
TFE3, and TFEB. 

- Vertebrate neurogenic differentiation factor 1 that acts as differentiation factor during 
2 5 neurogenesis. 

- Vertebrate MAX protein, a transcription regulator that forms a sequence- specific DNA- 
binding protein complex with myc or mad. 

- Vertebrate Max Interacting Protein 1 (MXI1 protein) which acts as a transcriptional 
repressor and may antagonize myc transcriptional activity by competing for max. 

30 - Proteins of the bHLH/PAS superfamily which are transcriptional activators. In mammals, 
AH receptor nuclear translocator (ARNT), single-minded homologs (SIM1 and SIM2), 
hypoxia-inducible factor 1 alpha (HIF1A), AH receptor (AHR), neuronal pas domain proteins 
(NPAS1 and NPAS2), endothelial pas domain protein 1 (EPAS1), mouse ARNT2, and 
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human BMAL1. In drosophila, single-minded (SIM), AH receptor nuclear translocate* 
(ARNT), trachealess protein (TRH), and similar protein (SIMA). 

- Mammalian transcription factors HES, which repress transcription by acting on two types 
of DNA sequences, the E box and the N box. 

- Mammalian MAD protein (max dimerizer) which acts as transcriptional repressor and may 
antagonize myc transcriptional activity by competing for max. 

- Mammalian Upstream Stimulatory Factor 1 and 2 (USF1 and USF2), which bind to a 
symmetrical DNA sequence that is found in a variety of viral and cellular promoters. 

- Human lyl-1 protein; which is involved, by chromosomal translocation, in T- cell leukemia. 

- Human transcription factor AIM. 

- Mouse helix-loop-helix proteins MATH-1 and MATH-2 which activate E box- dependent 
transcription in collaboration with E47. 

- Mammalian stem cell protein (SCL) (also known as tall), a protein which may play an 
important role in hemopoietic differentiation. SCL is involved, by chromosomal 
translocation, in stem-cell leukemia. 

- Mammalian proteins Idl to Id4 [5]. Id (inhibitor of DNA binding) proteins lack a basic 
DNA-binding domain but are able to form heterodimers with other HLH proteins, thereby 
inhibiting binding to DNA. 

- Drosophila extra-macrochaetae (emc) protein, which participates in sensory organ 
patterning by antagonizing the neurogenic activity of the achaete- scute complex. Emc is the 
homolog of mammalian Id proteins. 

- Human Sterol Regulatory Element Binding Protein 1 (SREBP-1), a transcriptional activator 
that binds to the sterol regulatory element 1 (SRE-1) found in the flanking region of the 
LDLR gene and in other genes. 

- Drosophila achaete-scute (AS-C) complex proteins T3 (l f sc), T4 (scute), T5 (achaete) and 
T8 (asense). The AS-C proteins are involved in the determination of the neuronal precursors 
in the peripheral nervous system and the central nervous system. 

- Mammalian homologs of achaete-scute proteins, the MASH-1 and MASH-2 proteins. 

- Drosophila atonal protein (ato) which is involved in neurogenesis. 

- Drosophila daughterless (da) protein, which is essential for neurogenesis and sex- 
determination. 

- Drosophila deadpan (dpn), a hairy-like protein involved in the functional differentiation of 
neurons. 
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- Drosophila delilah (dei) protein, which is plays an important role in the differentiation of 
epidermal cells into muscle. 

- Drosophila hairy (h) protein, a transcriptional repressor which regulates the embryonic 
segmentation and adult bristle patterning. 

5 - Drosophila enhancer of split proteins E(spl), that are hairy-like proteins active during 
neurogenesis, also act as transcriptional repressors. 

- Drosophila twist (twi) protein, which is involved in the establishment of germ layers in 
embryos. 

- Maize anthocyanin regulatory proteins R-S and LC. 

10 - Yeast centromere-binding protein 1 (CPF1 or CBF1). This protein is involved in 

chromosomal segregation. It binds to a highly conserved DNA sequence, found in centromers 
and in several promoters. 

- Yeast IN02 and IN04 proteins. 

- Yeast phosphate system positive regulatory protein PH04 which interacts with the 
1 5 upstream activating sequence of several acid phosphatase genes. 

- Yeast serine-rich protein TYE7 that is required for ty-mediated ADH2 expression. 

- Neurospora crassa nuc-1, a protein that activates the transcription of structural genes for 
phosphorus acquisition. 

- Fission yeast protein escl which is involved in the sexual differentiation process. 

20 

The schematic representation of the helix-loop-helix domain is shown here: 

xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx Amphipathic 

helix 1 Loop Amphipathic helix 2 

2 5 The signature pattern that had been developed to detect this domain spans completely the 
second amphipathic helix. 

Consensus pattem{^MS¥Ay}[DENSlAP SEP ID NO:306Yi-[KR]- 

[U V MAGS N T][IJV^ 
3 0 NO:308))-P^VMTj-[LiVMT SEP ID NO:l )h[y¥M|[ i,iVM SEP IPNO:4)]- x(2)- 

[STA¥][STAV SEP ID NO: I Q5)1-^VM-S^\€KR}[LIVMSTACKR SEP ID NO:309)]-x- 
|VMF¥M][VMFYH SEP ID NO:310)^fMVMlA:}i;LlVMTA SEP ID NO:31 1V|-{P}-{P}- 
[LtV-MR:K:HQ}[LTVMRKHP SEP ID NO : 3 12)] Sequences known to belong to this class 
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detected by the pattern the majority but far from alL Other sequence(s) detected in SWISS- 
PROT135. 

[ 1] Murre C, McCaw P.S., Baltimore D. Cell 56:777-783(1989). 
[ 2] Garrel J., Campuzano S. BioEssays 13:493-498(1991). 
[ 3] Kato G.J., Dang C.V. FASEB J. 6:3065-3072(1992). 

[ 4] Krause M., Fire A., Harrison S.W., Priess J., Weintraub H. Cell 63:907-919(1990). 
[ 5] Riechmann V., van Cruechten I., Sablitzky F. Nucleic Acids Res. 22:749-755(1994). 

919. (Beta-lactamase) 

Beta-lactamases classes -A, -C, and -D active site 

Beta-lactamases (EC 3.5.2.6) [1,2] are enzymes which catalyze the hydrolysis of an amide 
bond in the beta-lactam ring of antibiotics belonging to the penicillin/cephalosporin 
family. Four kinds of beta-lactamase have been identified [3]. Class-B enzymes are zinc 
containing proteins whilst class -A, C and D enzymes are serine hydrolases. The three 
classes of serine beta- 
lactamases are evolutionary related and belong to a superfamily [4] that also includes DD- 
peptidases and a variety of other penicillin-binding proteins (PBP's). All these proteins 
contain a Ser-x-x-Lys motif, where the serine is the active site residue. Although clearly 
homologous, the sequences of the three classes of serine beta-lactamases exhibit a large 
degree of variability and only a small number of residues are conserved in addition to the 
catalytic serine. 

Since a pattern detecting all serine beta-lactamases would also pick up many unrelated 
sequences, it was decided to provide specific patterns, centered on the active site serine, for 
each of the three classes. 

Consensus pattern rFYVx- fLTVMFYi r LIVMFY SE P ID NO: 1 8>]-x-S-[TV]-y-1C-y(4). 
[AGI^f AGLM SEQ ID NO:739)j-x(2)-[LC] [S is the active site residue] Sequences known 
to belong to this class detected by the patternALL class-A beta-lactamases. Other sequence(s) 
detected in SWISS-PROT7. 
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Consensus pattern F-E-fLfVM}[LIVM 

NO:202)]-[SA]-K [The first S is the active site residue] Sequences known to belong to this 
class detected by the patternALL class-C beta-lactamases. Other sequence(s) detected in 
SWISS-PROTNONE. 

Consensus pattern [PA]-x-S-[ST]-F-K-[LIV]-[PAL]-x-[STA]-[LI] [S is the active site 
residue] Sequences known to belong to this class detected by the patternALL class-D beta- 
lactamases. Other sequence(s) detected in SWISS-PROTNONE. 

[ 1] Ambler R.P. Philos. Trans. R. Soc. Lond., B ? Biol. Sci. 289:321-331(1980). 

[ 2] Pastor N., Pinero D., Valdes A.M., Soberon X. Mol. Microbiol. 4:1957-1965(1990). 

[ 3] Bush K. Antimicrob. Agents Chemother. 33:259-263(1989). 

[ 4] Joris B. 5 Ghuysen J.-M. ? Dive G., Renard A., Dideberg O., Charlier P., Frere J.M., Kelly 
J.A., Boyington J.C., Moews P.C., Knox J.R. Biochem. J. 250:313-324(1988). 

920. Biotin protein ligase (BPL) 

Biotin is covalently attached at the active site of certain enzymes that transfer carbon dioxide 
from bicarbonate to organic acids to form cellular metabolites. Biotin protein ligase (BPL) is 
the enzyme responsible for attaching biotin to a specific lysine at the active site of biotin 
enzymes. Each organism probably has only one BPL. Biotin attachment is a two step 
reaction that results in the formation of an amide linkage between the carboxyl group of 
biotin and the epsilon-amino group of the modified lysine [2]. 
Number of members: 26 

[1] Wilson KP, Shewchuk LM, Brennan RG, Otsuka AJ, Matthews BW; Medline: 93028443 

"Escherichia coli biotin holoenzyme synthetase/bio repressor crystal structure delineates the 

biotin- and DNA-binding domains." Proc Natl Acad Sci USA 1992;89:9257-9261 . 

[2] Chapman-Smith A, Cronan JE Jr; Medline: 10470036 "The enzymatic biotinylation of 

proteins: a post-translational modification of exceptional specificity." Trends Biochem Sci 

1999;24:359-363. 



921. (BRCA2_repeat) 
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The alignment covers only the most conserved region of the repeat. Respiratory-chain NADH 
dehydrogenase 30 Kd subunit signature 

[1] Bork P, Blomberg N 5 Nilges M; Medline: 96241568 "Internal repeats in the BRCA2 
protein sequence." Nat Genet 1996;13:22-23. 

Number of members: 63 

922. (C6) 

This domain of unknown function is found in the C. elegans protein Swiss:Q19522. It is 
presumed to be an extracellular domain. The C6 domain contains six conserved cysteine 
residues in most copies of the domain. However some copies of the domain are missing 
cysteine residues 1 and 3 suggesting that these form a disulphide bridge. 
Number of members: 23 

923. Cadherin cytoplasmic region (Cadherin_C_term) 

Cadherins are vital in cell-cell adhesion during tissue differentiation. Cadherins are linked to 
the cytoskeleton by catenins. Catenins bind to the cytoplasmic tail of the cadherin. Cadherins 
cluster to form foci of homophilic binding units. A key determinant to the strength of the 
binding that it is mediated by cadherins is the juxtamembrane region of the cadherin. This 
region induces clustering and also binds to the protein pl20ctn [1]. 
Number of members: 59 

[1] Yap AS, Niessen CM, Gumbiner BM; Medline: 9823441 1 "The juxtamembrane region of 
the cadherin cytoplasmic tail supports lateral clustering, adhesive strengthening, and 
interaction with pl20ctn." J Cell Biol 1998;141:779-789. 

[2] Barth AI, Nathke IS, Nelson WJ; Medline: 97471931 "Cadherins, catenins and APC 
protein: interplay between cytoskeletal complexes and signaling pathways." Curr Opin Cell 
Biol 1997;9:683-690. 
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[3] Braga VM, Machesky LM, Hall A, Hotchin NA; Medline: 97327766 "The small GTPases 
Rho and Rac are required for the establishment of cadherin-dependent cell-cell contacts." J 
Cell Biol 1997;137:1421-1431. 

924. Clathrin propeller repeat (Clathrin_propel) 

Clathrin is the scaffold protein of the basket-like coat that surrounds coated vesicles. The 
soluble assembly unit, a triskelion, contains three heavy chains and three light chains in an 
extended three-legged structure. Each leg contains one heavy and one light chain. The N- 
terminus of the heavy chain is known as the globular domain, and is composed of seven 
repeats which form a beta propeller [1]. 
Number of members: 6 1 

[1] ter Haar E, Musacchio A, Harrison SC, Kirchhausen T; Medline: 99043510 "Atomic 
structure of clathrin: a beta propeller terminal domain joins an alpha zigzag linker." Cell. 
1998;95:563-573. 

925. Respiratory-chain NADH dehydrogenase 30 Kd subunit signature (complex l_30Kd) 

Respiratory-chain NADH dehydrogenase (EC 1.6.5.3) [1,2] (also known as complex I or 
NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the 
inner mitochondrial membrane which also seems to exist in the chloroplast and in 
cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide 
subunits of this bioenergetic enzyme complex there is one with a molecular weight of 30 
Kd (in mammals) which has been found to be: 

- Nuclear encoded, as a precursor form with a transit peptide in mammals, and in Neurospora 
crassa. 

- Mitochondrial encoded in Paramecium (protein PI), and in the slime mold Dictyostelium 
discoideum (ORF 209). 

- Chloroplast encoded in various higher plants (ORF 159). It is also present in bacteria: 

- In the cyanobacteria Synechocystis strain PCC 6803 (gene ndhJ). 

- Subunit C of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoC). 

- Subunit NQOS of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 
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This protein, in its mature form, consists of from 157 to 266 amino acid residues. The 
best conserved region is located in the C-terminal section and can be used as a signature 
pattern. 

Consensus pattern E-R-E-x(2)-[DEH«¥M^^ 

x(3)4KRP]-x-f^^|-{UVM SEQ ID N O:4}]- tf^VM¥£H IiVMYS SEP ID NO:740Y j 
Sequences known to belong to this class detected by the patternALL. Other sequence(s) 
detected in SWISS-PROTNONE. 

[ 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T., Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991). 

926. Respiratory-chain NADH dehydrogenase 49 Kd subunit signature (complex l_49Kd) 

Respiratory-chain NADH dehydrogenase (EC 1.6.5.3) [1,2] (also known as complex I or 
NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the 
inner mitochondrial membrane which also seems to exist in the chloroplast and in 
cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide 
subunits of this bioenergetic enzyme complex there is one with a molecular weight of 49 Kd 
(in mammals), which is the third largest subunit of complex I and is a component of the 
iron-sulfur (IP) fragment of the enzyme. It seems to bind a 4Fe-4S iron-sulfur cluster. The 49 
Kd subunit has been found to be: 

- Nuclear encoded, as a precursor form with a transit peptide in mammals, and in Neurospora 
crassa. 

- Mitochondrial encoded in protozoan such as Paramecium (ORF 400), Leishmania and 
Trypanosoma (MURF 3). 

- Chloroplast encoded in various higher plants (ORF 392). 
The 49 Kd subunit is highly similar to [3,4]: 

- Subunit D of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoD). 

- Subunit NQ04 of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 

- Subunit 5 of Escherichia coli formate hydrogenlyase (gene hycE). 

- Subunit G of Escherichia coli hydrogenase-4 (gene hyfG). 
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A highly conserved region was seleceted as signature pattern, located in the N-terminal 
section of this subunit. 

Consensus pattern £MVMM]{UVM 

{W¥M^lLIVMlN.SEQiD.N Sequences known to belong to this 

class detected by the patternALL. 

[ 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T. 5 Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991). 

[ 3] Fearnley I.M., Walker J.E. Biochim. Biophys. Acta 1 140:105-134(1992). 

[ 4] Weidner U., Geier S., Ptock A. ? Friedrich T., Leif H. 5 Weiss H. J. Mol. Biol. 233:109- 

122(1993). 

927. (COX2) 

Cytochrome c oxidase (EC 1.9.3.1) [1,2] is an oligomeric enzymatic complex which is a 
component of the respiratory chain and is involved in the transfer of electrons from 
cytochrome c to oxygen. In eukaryotes this enzyme complex is located in the mitochondrial 
inner membrane; in aerobic prokaryotes it is found in the plasma membrane. The enzyme 
complex consists of 3-4 subunits (prokaryotes) to up to 13 polypeptides (mammals). 

Subunit 2 (CO II) transfers the electrons from cytochrome c to the catalytic subunit 1 . It 
contains two adjacent transmembrane regions in its N- terminus and the major part of the 
protein is exposed to the periplasmic or to the mitochondrial intermembrane space, 
respectively. CO II provides the substrate- binding site and contains a copper center called 
Cu(A), probably the primary acceptor in cytochrome c oxidase. An exception is the 
corresponding subunit of the cbb3-type oxidase which lacks the copper A redox-center. 
Several bacterial CO II have a C-terminal extension that contains a covalently bound heme c. 

It has been shown [3,4] that nitrous oxide reductase (EC 1.7.99.6) (gene nosZ) of 
Pseudomonas has sequence similarity in its C-terminus to CO II. This enzyme is part of the 
bacterial respiratory system which is activated under anaerobic conditions in the presence of 
nitrate or nitrous oxide. NosZ is a periplasmic homodimer that contains a dinuclear copper 
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center, probably located in a 3- dimensional fold similar to the cupredoxin-like fold that has 
been suggested for the copper-binding site of CO II [3]. 

The dinuclear purple copper center is formed by 2 histidines and 2 cysteines [5]. This region 
was used as a signature pattern. The conserved valine and the conserved methionine are said 
to be involved in stabilizing the copper-binding fold by interacting with each other. 

Consensus pattern V-x-H-x(33,40)-C-x(3)-C-x(3)-H-x(2)-M [The two C's and two H*s are 
copper ligands] Sequences known to belong to this class detected by the patternALL, except 
for Paramecium primaurelia as well as in some plants where the pattern ends with Thr; an 
RNA editing event at this position could change this Thr to Met. 

Note: cytochrome cbb(3) subunit 2 does not belong to this family. 

[ 1] Capaldi R.A., Malatesta F., Darley-Usmar V.M. Biochim. Biophys. Acta 726:135- 
148(1983). 

[ 2] Garcia-Horsman J.A., Barquera B., Rumbley J., Ma J., Gennis R.B. J. Bacteriol. 
176:5587-5600(1994). 

[ 3] van der Oost J. 5 Lappalainen P., Musacchio A., Warne A., Lemieux L., Rumbley J., 
Gennis R.B., Aasa R., Pascher T., Malmstrom B.G., Saraste M. EMBO J. 11:3209- 
3217(1992). 

[ 4] Zumft W.G., Dreutsch A., Loechelt S., Cuypers H. ? Friedrich B. 5 Schneider B. Eur. J. 
Biochem. 208:31-40(1992). 

928. Cytochrome C assembly protein (CytCasm) 

This family consists of various proteins involved in cytochrome c assembly from 
mitochondria and bacteria; CycK from Rhizobium[3], CcmC from E. coli and Paracoccus 
denitrificans [2,1] and orf240 from wheat mitochondria [4]. The members of this family are 
probably integral membrane proteins with six predicted transmembrane helices. It has been 
proposed that members of this family comprise a membrane component of an ABC (ATP 
binding cassette) transporter complex. It is also proposed that this transporter is necessary for 
transport of some component needed for cytochrome c assembly. One member CycK 
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contains a putative heme-binding motif [3], orf240 also contains a putative heme-binding 
motif and is a proposed ABC transporter with c-type heme as its proposed substrate [4], 
However it seems unlikely that all members of this family transport heme nor c-type 
apocytochromes because CcmC in the putative CcmABC transporter transports neither [1]. 
Number of members: 67 

[1] Page D, Pearce DA, Norris HA, Ferguson SJ; Medline: 97195802 "The Paracoccus 
denitrificans ccmA, B and C genes: cloning and sequencing, and analysis of the potential of 
their products to form a haem or apo-c-type cytochrome transporter. MICROBIOLOGY 
1997;143:563-576. 

[2] Thoeny-meyer L, Fischer F, Kunzler P, Ritz D, Hennecke H; Medline: 95362656 
"Escherichia coli genes required for cytochrome c maturation." J. BACTERIOL 
1995;177:4321-4326. 

[3] Delgado MJ, Yeoman KH, Wu G, Vargas C, Davies A 5 Poole RK, Johnston AWB, 
Downie JA; Medline: 95394794 "Characterization of the cycHJKL genes involved in 
cytochrome c biogenesis and symbiotic nitrogen fixation in Rhizobium leguminosarum." J. 
BACTERIOL 1995;177:4927-4934. 

[4] Bonnard G, Grienenberger JM; Medline: 95124303 "A gene proposed to encode a 
transmembrane domain of an ABC transporter is expressed in wheat mitochondria." MOL. 
GEN. GENET 1995;246:91-99. 

929. Cytochrome b559 subunits heme-binding site signature (cytochr_b559) 

Cytochrome b5 59 [1] is an essential component of photosystem II complex from oxygenic 
photosynthetic organisms. It is an integral thylakoid membrane protein composed of two 
subunits, alpha (gene psbE) and beta (gene psbF), each of which contains a histidine residue 
located in a transmembrane region. The two histidines coordinate the heme iron of 
cytochrome b559. 

The region around the heme-binding residue of both subunits is very similar and can be used 
as a signature pattern. 



Reference No. 2750-942P 



777 

Consensus pattem[LIV]-x-[ST|-|WiqnLTVF SEP ID NO: 127>]-R-[FYW]-v(?>pv].H- 

fSTCAffSTGA SEQ ID NO:741)l-[LIV]- fgqPGA lfSTGA SEP ID NO:741 »] -[TV]-P [H is the 
heme iron ligand] Sequences known to belong to this class detected by the patternALL. Other 
sequence(s) detected in SWISS-PROTNPNE. 

[ 1] Pakrasi H.B., de Ciechi P., Whitmarsh J. EMBP J. 10:1619-1627(1991). 
930. Cytochrome b/b6 signatures (Cytochromeb) 

In the mitochondrion of eukaryotes and in aerobic prokaryotes, cytochrome b is a component 
of respiratory chain complex III (EC 1.10.2.2) - also known as the be 1 complex or ubiquinol- 
cytochrome c reductase. In plant chloroplasts and cyanobacteria, there is a analogous protein, 
cytochrome b6, a component of the plastoquinone-plastocyanin reductase (EC 1.10.99.1), 
also known as the b6f complex. 

Cytochrome b/b6 [1,2] is an integral membrane protein of approximately 400 amino acid 
residues that probably has 8 transmembrane segments. In plants and cyanobacteria, 
cytochrome b6 consists of two subunits encoded by the petB and petD genes. The sequence 
of petB is colinear with the N-terminal part of mitochondrial cytochrome b, while petD 
corresponds to the C-terminal part. Cytochrome b/b6 non-covalently binds two heme groups, 
known as b562 and b566. Four conserved histidine residues are postulated to be the ligands 
of the iron atoms of these two heme groups. 

Apart from regions around some of the histidine heme ligands, there are a few conserved 
regions in the sequence of b/b6. The best conserved of these regions includes an invariant P- 
E-W triplet which lies in the loop that separates the fifth and sixth transmembrane segments. 
It seems to be important for electron transfer at the ubiquinone redox site - called Qz or Qo 
(where o stands for outside) - located on the outer side of the membrane. 

A schematic representation of the structure of cytochrome b/b6 is shown below. 
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+— Fe-b562 — + | +— Fe-b566~|-+ IN! 

xxxxxxxxxxxHxHxxxxxxxxxxxxHxHxxxxxxxxxxPEWxxxxxxxxxxxxxxxxxx < 

— Cytochrome-b > < — Cytochrome-b6-petB ><--Cytochrome- 

b6-petD > 

Two signature patterns were developed for cytochrome b/b6. The first includes the first 
conserved histidine of b^6 ? which is a heme b562 ligand; the second includes the conserved 
PEW triplet. 

Consensus pattern F&SNQI iDENO SEP ID "NOB? \ )1-x(3)-G-{WWMO}f FYWMO SEP ID 
NO:742jl-x-j^^ [H is a heme b562 ligand] 

Sequences known to belong to this class detected by the patternALL, except for 5 sequences. 

Consensus pattern P-[DE]-W-[FY]-[LFY](2) Sequences known to belong to this class 
detected by the patternALL, except for Odocoileus hemionus (mule deer) and Paramecium 
tetraurelia cytochrome b. 

[ 1] Howell N. J. Mol. Evol. 29:157-169(1989). 

[ 2] Esposti M.D., de Vries S., Crimi M., Ghelli A., Patarnello T., Meyer A. Biochim. 
Biophys. Acta 1 143:243-271(1993). 

93 1. Phorbol esters / diacylglycerol binding domain (DAG_PE-bind) 

Diacylglycerol (DAG) is an important second messenger. Phorbol esters (PE) are analogues 
of DAG and potent tumor promoters that cause a variety of physiological changes when 
administered to both cells and tissues. DAG activates a family of serine/threonine protein 
kinases, collectively known as protein kinase C (PKC) [1]. Phorbol esters can directly 
stimulate PKC. The N- terminal region of PKC, known as CI, has been shown [2] to bind PE 
and DAG in a phospholipid and zinc-dependent fashion. The CI region contains one or two 
copies (depending on the isozyme of PKC) of a cysteine-rich domain about 50 amino-acid 
residues long and essential for DAG/PE-binding. Such a domain has also been found in the 
following proteins: 



Reference No. 2750-942P 



779 

- Diacylglycerol kinase (EC 2.7.1.107) (DGK) [3], the enzyme that converts DAG into 
phosphatidate. It contains two copies of the DAG/PE-binding domain in its N-terminal 
section. At least five different forms of DGK are known in mammals. 

- N-chimaerin. A brain specific protein which shows sequence similarities with the BCR 
protein at its C-terminal part and contains a single copy of the DAG/PE-binding domain at its 
N-terminal part. It has been shown [4,5] to be able to bind phorbol esters. 

- The raf/mil family of serine/threonine protein kinases. These protein kinases contain a 
single N-terminal copy of the DAG/PE-binding domain. 

- The unc-13 protein from Caenorhabditis elegans. Its function is not known but it contains a 
copy of the DAG/PE-binding domain in its central section and has been shown to bind 
specifically to a phorbol ester in the presence of calcium [6]. 

- The vav oncogene. Vav was generated by a genetic rearrangement during gene transfer 
assays. Its expression seems to be restricted to cells of hematopoeitic origin. Vav seems [5,7] 
to contain a DAG/PE-binding domain in the central part of the protein. 

- The Drosophila GTPase activating protein rotund. 

The DAG/PE-binding domain binds two zinc ions; the ligands of these metal ions are 
probably the six cysteines and two histidines that are conserved in this domain. A signature 
pattern was developed that spans completely the DAG/PE domain. 

Consensus pattern H-x-[iiVMF YW] [U^ 1 1 )-C-x(2)-C-x(3)- 

\hm4PG{ \LlWMFC S EP ID NO:90)l-x(5.10)- C-x(2)-C-x(4)-[HD]-x(2)-C-x(5,9)-C [All 
the C and H are involved in binding Zinc] Sequences known to belong to this class detected 
by the pattern ALL, except a few DGK's. 

[ 1] Azzi A., Boscoboinik D., Hensey C. Eur. J. Biochem. 208:547-557(1992). 

[ 2] Ono Y., Fujii T., Igarashi K., Kuno T., Tanaka C, Kikkawa U., Nishizuka Y. Proc. Natl. 

Acad. Sci. U.S.A. 86:4868-4871(1989). 

[ 3] Sakane F., Yamada K., Kanoh H., Yokoyama C, Tanabe T. Nature 344:345-348(1990). 
[ 4] Ahmed S., Kozma R., Monfries C, Hall C, Lim H.H., Smith P., Lim L. Biochem. J. 
272:767-773(1990). 

[ 5] Ahmed S., Kozma R., Lee J., Monfries C, Harden N., Lim L. Biochem. J. 280:233- 
241(1991). 
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[ 6] Ahmed S., Maruyama I.N., Kozma R., Lee J., Brenner S., Lim L. Biochem. J. 287:995- 
999(1992). 

[ 7] Boguski M.S., Bairoch A., Attwood T.K., Michaels G.S. Nature 358:1 13-1 13(1992). 
932. 3-dehydroquinate synthase (DHQ_synthase) 

[1] Barten R, Meyer TF; Medline: 98273626 "Cloning and characterisation of the Neisseria 
gonorrhoeae aroB gene." Mol Gen Genet 1998;258:34-44. 

[2] Hawkins AR, Lamb HK; Medline: 96048023 "The molecular biology of multidomain 
proteins. Selected examples." Eur J Biochem 1995;232:7-18. 

The 3-dehydroquinate synthase EC:4.6.1.3 domain is present in isolation in various bacterial 
3-dehydroquinate synthases and also present as a domain in the pentafunctional AROM 
polypeptide Swiss:P07547 [2]. 3-dehydroquinate (DHQ) synthase catalyses the formation of 
dehydroquinate (DHQ) and orthophosphate from 3-deoxy-D-arabino heptulosonic 7 
phosphate [1]. This reaction is part of the shikimate pathway which is involved in the 
biosynthesis of aromatic amino acids. 
Number of members: 25 

933. Dihydrofolate reductase signature (DiHfolate_red) 

Dihydrofolate reductases (EC 1.5.1.3) [1] are ubiquitous enzymes which catalyze the 
reduction of folic acid into tetrahydrofolic acid. They can be inhibited by a number of 
antagonists such as trimethroprim and methotrexate which are used as antibacterial or 
anticancerous agents. A signature pattern was derived from a region in the N-terminal part of 
these enzymes, which includes a conserved Pro-Trp dipeptide; the tryptophan has been 
shown [2] to be involved in the binding of substrate by the enzyme. 

Consensus pattemfL¥A6Q[I^^ 

SEQ ID NO:2)|-P-W-x(4,5)4DEl>x(3VfF¥mf FYIV SEP ID NO:744Yj - 
x(3MS ; »Q^ Sequences known to belong to this class detected by 

the patternALL, except for type II bacterial, plasmid-encoded, dihydrofolate reductases 
which do not belong to the same class of enzymes. 
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[ 1] Harpers' Review of Biochemistry, Lange, Los Altos (1985). 

[ 2] Bolin J.T., Filman D.J., Matthews D.A., Hamlin R.C., Kraut J. J. Biol. Chem. 257:13650- 
13662(1982). 

934. (DIL) 

[1] Ponting CP; Medline: 95397417 "AF-6/cno: neither a kinesin nor a myosin, but a bit of 
both." Trends Biochem Sci 1995;20:265-266. 

Number of members: 3 1 

935. (DNA_gyraseB_C) 

DNA topoisomerase II signature (cross-reference = TOPOISOMERASEJI) 

DNA topoisomerase I (EC 5.99.1.2) [1,2,3,4,E1] is one of the two types of enzyme that 
catalyze the interconversion of topological DNA isomers. Type II topoisomerases are ATP- 
dependent and act by passing a DNA segment through a transient double-strand break. 
Topoisomerase II is found in phages, archaebacteria, prokaryotes, eukaryotes, and in African 
Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits 
(the product of genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known 
as DNA gyrase, consists of two subunits (genes gyrA and gyrB [E2]). In some bacteria, a 
second type II topoisomerase has been identified; it is known as topoisomerase IV and is 
required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
In eukaryotes, type II topoisomerase is a homodimer. 

There are many regions of sequence homology between the different subtypes of 
topoisomerase II. The relation between the different subunits is shown in the following 
representation: 

< About- 1 400-residues > 

[ Protein 39-* ][ — Protein 52 — ] Phage T4 

[ gy r B *. — ][ gyrA j Prokaryote II 
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Archaebacteria 



■parE- 



■parD 



] Prokaryote IV 

] Eukaryote and ASF 



* 



'*': Position of the pattern. 

5 

As a signature pattern for this family of proteins, a region was selected that contains a highly 
conserved pentapeptide. The pattern is located in gyrB, in parE, and in protein 39 of phage 
T4 topoisomerase. 

1 0 Consensus pattern 1'Ll V M A ti'L'l.VMA SEP ID NO:30 )]-x-E-G-[DN]-S-A-x4S^^ 

$EQ.Ml]^Q2tM Sequences known to belong to this class detected by the pattern ALL. 

[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1990). 
[ 2] Bjornsti M.-A. Curr. Opin. Struct. Biol. 1:99-103(1991). 
15 [3] Sharma A., Mondragon A. Curr. Opin. Struct. Biol. 5:39-47(1995). 
[ 4] Roca J. Trends Biochem. Sci. 20:156-160(1995). 

936. (DNA_topoisolIV) 

DNA topoisomerase II signature (cross-reference = TOPOISOMERASE_II) 

20 

DNA topoisomerase I (EC 5.99.1.2) [1,2,3,4,E1] is one of the two types of enzyme that 
catalyze the interconversion of topological DNA isomers. Type II topoisomerases are ATP- 
dependent and act by passing a DNA segment through a transient double-strand break. 
Topoisomerase II is found in phages, archaebacteria, prokaryotes, eukaryotes, and in African 

2 5 Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits 

(the product of genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known 
as DNA gyrase, consists of two subunits (genes gyrA and gyrB [E2]). In some bacteria, a 
second type II topoisomerase has been identified; it is known as topoisomerase IV and is 
required for chromosome segregation, it also consists of two subunits (genes parC and parE). 

30 In eukaryotes, type II topoisomerase is a homodimer. 



Reference No. 2750-942P 



783 

There are many regions of sequence homology between the different subtypes of 
topoisomerase II. The relation between the different subunits is shown in the following 
representation: 

< — About- 1 400-residues > 

[ Protein 39-*-—- ][ — Protein 52 — ] Phage T4 

[ gyr B * ][ gyr A ] Prokaryote II Archaebacteria 

[ par E * ][ parD j p r okaryote IV 

[ * ] Eukaryote and ASF 

'*': Position of the pattern. 

As a signature pattern for this family of proteins, a region was selected that contains a highly 
conserved pentapeptide. The pattern is located in gyrB, in parE, and in protein 39 of phage 
T4 topoisomerase. 

Consensus pattern fM¥MA-}IUV 

SEQ. jD.NOj2f)).l Sequences known to belong to this class detected by the patternALL. 

[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1990). 

[ 2] Bjornsti M.-A. Curr. Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Sharma A. 5 Mondragon A. Curr. Opin. Struct. Biol. 5:39-47(1995). 

[ 4] Roca J. Trends Biochem. Sci. 20:156-160(1995). 

937. Prolyl oligopeptidase family serine active site (DPPIVJNf_term) 

The prolyl oligopeptidase family [1,2,3] consist of a number of evolutionary related 
peptidases whose catalytic activity seems to be provided by a charge relay system similar to 
that of the trypsin family of serine proteases, but which evolved by independent convergent 
evolution. The known members of this family are listed below. 

- Prolyl endopeptidase (EC 3.4.21.26) (PE) (also called post-proline cleaving enzyme). PE is 
an enzyme that cleaves peptide bonds on the C-terminal side of prolyl residues. The sequence 
of PE has been obtained from a mammalian species (pig) and from bacteria (Flavobacterium 
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meningosepticum and Aeromonas hydrophila); there is a high degree of sequence 
conservation between these sequences. 

- Escherichia coli protease II (EC 3.4.21.83) (oligopeptidase B) (gene prtB) which cleaves 
peptide bonds on the C-terminal side of lysyl and argininyl residues. 

- Dipeptidyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an enzyme that removes N- 
terminal dipeptides sequentially from polypeptides having unsubstituted N-termini provided 
that the penultimate residue is proline. 

- Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: STE13) which is responsible 
for the proteolytic maturation of the alpha-factor precursor. 

- Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 

- Acylamino-acid-releasing enzyme (EC 3.4.19.1) (acyl-peptide hydrolase). This enzyme 
catalyzes the hydrolysis of the amino-terminal peptide bond of an N-acetylated protein to 
generate a N-acetylated amino acid and a protein with a free amino-terminus. 

A conserved serine residue has experimentally been shown (in E.coli protease II as well as in 
pig and bacterial PE) to be necessary for the catalytic mechanism. This serine, which is part 
of the catalytic triad (Ser, His, Asp), is generally located about 150 residues away from the C- 
terminal extremity of these enzymes (which are all proteins that contains about 700 to 800 
amino acids). 

Consensus pattern D-x(3)-A-x(3) PLIVMF YW1 f LTVMF Y W SEP ID NP:26*] -xn4)-fi.y.S. 
x-G-G-{LIV M FY WjfLlVMFYW SEQ TP NP:26„)j( 2) [S is the active site residue] Sequences 
known to belong to this class detected by the pattern ALL, except for yeast DPAP A. 

Note: these proteins belong to families S9A/S9B/S9C in the classification of peptidases 
[4,E1]. 

[ 1] Rawlings N.D., Polgar L., Barrett A.J. Biochem. J. 279:907-91 1(1991). 
[ 2] Barrett A.J., Rawlings N.D. Biol. Chem. Hoppe-Seyler 373:353-360(1992). 
[ 3] Polgar L., Szabo E. Biol. Chem. Hoppe-Seyler 373:361-366(1992). 
[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

938. Deoxyhypusine synthase (DS) 
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Eukaryotic initiation factor 5A (eIF-5A) contains an unusual amino acid, 
hypusine [N epsilon-(4-aminobutyl-2-hydroxy)lysine]. The first step in the 
post-translational formation of hypusine is catalysed by the enzyme 
deoxyhypusine synthase (DS) EC: 1 . 1 . 1 .249. The modified version of eIF-5A, 
and DS, are required for eukaryotic cell proliferation [1]. 
Number of members: 9 

[1] Liao DI, Wolff EC, Park MH, Davies DR; Medline: 98154315 "Crystal structure of the 
NAD complex of human deoxyhypusine synthase: an enzyme with a ball-and-chain 
mechanism for blocking the active site." Structure 1998;6:23-32. 



939. (DUF21) 

Many of the sequences in this family are annotated as hemolysins, however this is due to a 
similarity to Swiss:Q54318 that does not contain this domain. This domain is found in the N- 
terminus of the proteins adjacent to two intracellular CBS domains CBS. 
Number of members: 42 

940. (DUF59) 

This family includes prokaryotic proteins of unknown function. The family also includes 
PhaH Swiss:084984 from Pseudomonas putida. PhaH forms a complex with PhaF 
Swiss:084982, PhaG Swiss:084983 and Phal Swiss:084985, which hydroxylates 
phenylacetic acid to 2-hydroxyphenylacetic acid [1]. So members of this family may all be 
components of ring hydroxylating complexes. 
Number of members : 1 5 

[1] Olivera ER, Minambres B, Garcia B, Muniz C, Moreno MA, Ferrandez A, Diaz E, Garcia 
JL, Luengo JM; Medline: 98263372 "Molecular characterization of the phenylacetic acid 
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catabolic pathway in Pseudomonas putida U: the phenylacetyl-CoA catabolon." Proc Natl 
Acad Sci U S A 1998;95:6419-6424. 

941. (DUF82) 

The protein contains four conserved cysteines that may be involved in metal binding or 
disulphide bridges. 
Number of members: 4 

942. Riboflavin kinase / FAD synthetase (FAD_Synth) 

This family consists part of the Afunctional enzyme riboflavin kinase / FAD synthetase. 
These enzymes have both ATP:riboflavin S'-phospho transferase and ATP:FMN- 
adenylyltransferase activitys [1]. They catalyse the 5'-phosphorylation of riboflavin to FMN 
and the adenylylation of FMN to FAD [1]. 

CAUTION: It is not clear if this region of the enzymes catalyses either or both of the 
enzymatic reactions. 
Number of members: 27 

[1] Manstein DJ, Pai EF; Medline: 87057286 "Purification and characterization of FAD 
synthetase from Brevibacterium ammoniagenes." J Biol Chem 1986;261:16169-16173. 

943. [2Fe-2S] binding domain (fer2_2) 

[1] Romao MJ, Archer M, Moura I, Moura JJ, LeGall J, Engh R ? Schneider M 5 Hof P, Huber 
R; Medline: 96072968 "Crystal structure of the xanthine oxidase-related aldehyde oxido- 
reductase from D. gigas." Science 1995;270: 1 170-1 1 76. 
Number of members: 53 

944. Filo virus glycoprotein (Filo_glycop) 



This family includes an extracellular region from the envelope glycoprotein of Ebola and 
Marburg viruses. This region is also produced as a separate transcript that gives rise to a r 
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structural, secreted glycoprotein, which is produced in large amounts and has an unknown 
function [1]. Processing of this protein may be involved in viral pathogenicity [2]. 
Number of members: 23 

[1] Volchkov VE, Feldmann H, Volchkova VA, Klenk HD; Medline: 98245155 "Processing 
of the Ebola virus glycoprotein by the proprotein convertase furin." Proc Natl Acad Sci U S 
A 1998;95:5762-5767. 

[2] Sanchez A, Trappier SG, Mahy BW, Peters CJ, Nichol ST; Medline: 96195018 "The 
virion glycoproteins of Ebola viruses are encoded in two reading frames and are expressed 
through transcriptional editing." Proc Natl Acad Sci U S A 1996;93:3602-3607. 

945. Frataxin-like domain (Frataxin Cyay) 

This family contains proteins that have a domain related to the globular C-terminus of 
Frataxin the protein that is mutated in Friedreich's ataxia. This domain is found in a family of 
bacterial proteins. The function of this domain is currently unknown. 
Number of members: 12 

[1] Gibson TJ, Koonin EV, Musco G, Pastore A, Bork P; Medline: 97084946 "Friedreich's 
ataxia protein: phylogenetic evidence for mitochondrial dysfunction." Trends Neurosci 
1996;19:465-468. 

946. (GAF) 

Domain present in phytochromes and cGMP-specific phosphodiesterases. 
Number of members: 296 

[1] Aravind L, Ponting CP; Medline: 98094688 "The GAF domain: an evolutionary link 
between diverse phototransducing proteins." Trends Biochem Sci 1997;22:458-459. 



947. Galaptin signature (Gal-bind lectin) 
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All vertebrates synthesize soluble galactoside-binding lectins [1,2,3] (also known as 
galectins, galaptins or S-lectin). These carbohydrate-binding proteins are developmentally 
regulated. Although their exact physiological role is not yet clear they seem to be involved in 
differentiation, cellular regulation and tissue construction. The sequence of galactoside- 
5 binding lectins from electric eel (electrolectin), conger eel (congerin), chicken and a number 
of mammalian species is known. These lectins are proteins of about 130 to 140 amino acid 
residues (14 Kd to 16 Kd). 

A number of other proteins are known to belong to this family: 
10 - Galectin-3 (also known as MAC-2 antigen; CBP-35 or IgE-binding protein), a 35 Kd lectin 
which binds immunoglobulin E and which is composed of two domains: a N-terminal domain 
that consist of tandem repeats of a glycine/ proline-rich sequence and a C-terminal galaptin 
domain. 

- Galectin-4 [4], which is composed of two galaptin domains. 
15 - Galectin-5. 

- Galectin-7 [5], a keratinocyte protein which could be involved in cell-cell and/or cell- 
matrix interactions necessary for normal growth control. 

- Galectin-8 [6], which is composed of two galaptin domains. 

- Galectin-9 [7], which is composed of two galaptin domains. 

2 0 - Human eosinophil lysophospholipase (EC 3.1.1.5) [8] (Charcot-Leyden crystal protein), a 
protein that may have both an enzymatic and a lectin activities. It forms hexagonal 
bipyramidal crystals in tissues and secretions from sites of eosinophil-associated 
inflammation. 

- Caenorhabditis elegans 32 Kd lactose-binding lectin [9]. This lectin is composed of two 
2 5 galaptin domains. 

- Caenorhabditis elegans lec-7 and lec-8. 

One of the conserved regions of these lectins contains a tryptophan that has been shown [10] 
to be essential to the binding of galactosides. This region was used as a signature pattern for 
these proteins. 

30 

Consensus pattemW-rGEK1-x-rEQ1-x-rKRE1-x(3,6)-fP€THI PCTF SEQ ID NO: 746)] - 
x(3)-P3^&^ [W 
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binds carbohydrate] Sequences known to belong to this class detected by the pattern ALL, 
except for pig galectin 4. 

[ 1] Barondes S.H., Gitt M.A., Leffler H., Cooper D.N.W. Biochimie 70:1627-1632(1988). 
[ 2] Hirabayashi J., Kasai K.-I. J. Biochem. 104:1-4(1988). 

[ 3] Barondes S.H., Castronovo V., Cooper D.N.W., Cummings R.D., Drickamer K., Feizi 
T. 5 Gitt M.A., Hirabayashi J., Hughes C, Kasai K.-L, Leffler H., Liu F.-T., Lotan R., 
Mercurio A.M., Monsigny M, Pillair S., Poirer F., Raz A., Rigby P.W.J., Rini J.M., Wang 
J.L. Cell 76:597-598(1994). 

[ 4] Oda Y., Herrmann J., Gitt M., Turck C. W., Burlingame A.L., Barondes S.H., Leffler H. 
J. Biol. Chem. 268:5929-5939(1993). 

[ 5] Madsen P., Rasmussen H.H., Flint T., Gromov P., Kruse T.A., Honore B., Vorum H., 
Celis J.E. J. Biol. Chem. 270:5823-5829(1995). 

[ 6] Hadari Y.R., Paz K., Dekel R., Mestrovic T., Accili D., Zick Y. J. Biol. Chem. 270:3447- 
3453(1995). 

[ 7] Wada J., Kanwar Y.S. J. Biol. Chem. 272:6078-6086(1997). 

[ 8] Ackerman S.J., Corrette S.E., Rosenberg H.F., Bennett J.C., Mastrianni D.M., 

Nicholson- Weller A., Weller P.F., Chin D.T., Tenen D.G. J. Immunol. 150:456-468(1993). 

[ 9] Hirabayashi J., Satoh M., Kasai K.-I. J. Biol. Chem. 267:15485-15490(1992). 

[10] Abbott W.M., Feizi T. J. Biol. Chem. 266:5552-5557(1991). 

948. (GARS) Phosphoribosylglycinamide synthetase signature (phosphoribosylamine glycine 
ligase) 

PROSITE: PDOC00164; cross-reference(s): PS00184 

[1] catalyzes the second step in the de novo biosynthesis of purine, the ATP-dependent 
addition of 5-phosphoribosylamine to glycine to form 5*phosphoribosylglycinamide. 

In bacteria GARS is a monofunctional enzyme (encoded by the purD gene), in of a 
Afunctional enzyme (encoded by the ADE5,7 gene), in higher eukaryotes it is part, with 
AIRS and with phosphoribosylglycinamide formyltransferase (GART) of a trifunctional 
enzyme (GARS- AIRS -GART). 

The sequence of GARS is well conserved. A highly conserved octapeptide was 
selected as a signature pattern. 
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Consensus patternR-F-G-D-P-E-x-[QM] 

Sequences known to belong to this class detected by the patternALL. 
[l]Aiba A., Mizobuchi K. J. Biol. Chem. 264:21239-21246(1989). 

949. GLTT - GLTT repeat (12 copies) 

This short repeat of unknown function is found in multiple copies in several C. elegans 
proteins. The repeat is five residues long and consists of XGLTT where X can be any amino 
acid. Number of members: 34. 

950. Glu_synthase - Conserved region in glutamate synthase 

This family represents a region of the glutamate synthase protein. This region is expressed as 
a seperate subunit in the glutamate synthase alpha subunit from archaebacteria, or part of a 
large multidomain enzyme in other organisms. The aligned region of these proteins contains a 
putative FMN binding site and Fe-S cluster. Number of members: 44. 

[1] Medline: 97082505. Sequence of the GLT1 gene from Saccharomyces cerevisiae reveals 
the domain structure of yeast glutamate synthase. Filetici P, Martegani MP, Valenzuela L, 
Gonzalez A, Ballario P; Yeast 1996;12:1359-1366. 

951. (Glyco_hydro_2) Glycosyl hydrolases family 2 signatures 
GLYCOSYL_HYDROL_F2_l ; PS00608; GLYCOSYL HYDROL F2 2 
It has been shown [1,2,E1] that the following glycosyl hydrolases can be, on the basis of 
sequence similarities, classified into a single family: 

-Beta-galactosidases (EC 3.2.1.23) from bacteria such as Escherichia coli (genes lacZ and 
ebgA), Clostridium acetobutylicum, Clostridium thermosulfurogenes, Klebsiella 
pneumoniae, Lactobacillus delbrueckii, or Streptococcus thermophilus and from the fungi 
Kluyveromyces lactis. 

-Beta-glucuronidase (EC 3.2.1.3 1) from Escherichia coli (gene uidA) and from mammals. 
One of the conserved regions in these enzymes is centered on a conserved glutamic acid 
residue which has been shown [3], in Escherichia coli lacZ, to be the general acid/base 
catalyst in the active site of the enzyme. This region has been used as a signature pattern. A 
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highly conserved region located some sixty residues upstream from the active site glutamate 
has been selected as a second signature pattern. 

Consensus pattern N-x-[L-l-VMP¥ 

Mi)i(2W3)-[DN]-x(2)-G-p¥MF¥^ 

known to belong to this class detected by the pattern ALL. 

Consensus pattern g>E^QfeF}fDENQLF S EP ID NO:30 2^-pER\3^{Wy w SEQ ID 
NOj3m)l-N.[HRY]-{^vmq^ SEP ID N(-):74qYj -[SAC]-jWMP ^|TJVMFS 

mmNai32)l(3)-W-[GS]-x(2,3)-N-E [E is the active site residue] Sequences known to 
belong to this class detected by the pattern ALL, except for Rhizobium meliloti lacZ. 

[l]Henrissat B. Biochem. J. 280:309-316(1991). 

[2]Schroeder C.J., Robert C, Lenzen G., McKay L.L., Mercenier A. J. Gen. Microbiol. 
137:369-380(1991). 

[3]Gebler J.C., Aebersold R., Withers S.G. J. Biol. Chem. 267:1 1 126-1 1 130(1992). 
952. (Glyco_hydro_3) Glycosyl hydrolases family 3 active site 

PRGSITE: PDGC00621. PRGSITE cross-reference(s)PS00775; GLYCGSYL_HYDRPL_F3 
It has been shown [1,2] that the following glycosyl hydrolases can be, on the basis of 
sequence similarities, classified into a single family: 

-Beta glucosidases (EC 3.2.1.21) from the fungi Aspergillus wentii (A-3), Hansenula 
anomala, Kluyveromyces fragilis, Saccharomycopsis fibuligera,(BGLl and BGL2), 
Schizophyllum commune and Trichoderma reesei (BGL1). 

-Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbgl), Butyrivibrio 
fibrisolvens (bglA), Clostridium thermocellum (bglB), Escherichia coli (bglX), Erwinia 
chrysanthemi (bgxA) and Ruminococcus albus. 
-Alteromonas strain G-7 beta-hexosaminidase A (EC 3.2.1.52). 
-Bacillus subtilis hypothetical protein yzbA. 

-Escherichica coli hypothetical protein ycfG and HI0959, the corresponding Haemophilus 
influenzae protein. 
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One of the conserved regions in these enzymes is centered on a conserved aspartic 
acid residue which has been shown [3], in Aspergillus wentii beta-glucosidase A3, to be 
implicated in the catalytic mechanism. This region was used as a signature pattern. 

Consensus pattemffefVMjEIY M...SEQ ED NQ:4)](2>[KR]-v-[Fr>K]-x(4)-G- 
rWMFF|fLTVMFT SEP ID NO-T*?>HMV fflLIVT SEP TP NO-l^^ 1 -p-u^ UJ ^ 
SEP ID NP:2)3-[ST]-D-x(2)-fS€AeNHfSGAPNI SEP IP NO-9^Yj [D is the active site ~~ 
residue] 

Sequences known to belong to this class detected by the patternALL. 
[l]Henrissat B. Biochem. J. 280:309-316(1991). 

[2]Castle L.A., Smith K.D., Morris R.P. J. Bacteriol. 174:1478-1486(1992). 
[3]Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1980). 

953. GP120 - Envelope glycoprotein GP120 

The entry of HIV requires interaction of viral GP120 with Swiss:P01730 and a chemokine 
receptor on the cell surface. Number of members: 1 7891 

[lJMedline: 98303379. Structure of an HIV gpl20 envelope glycoprotein in complex with 
the CD4 receptor and a neutralizing human antibody. Kwong PD, Wyatt R, Robinson J, 
Sweet RW, Sodroski J, Hendrickson WA; Nature 1998;393:648-659. 

954. (GSPIIE) Bacterial type II secretion system protein E signature 
PROSITE: PDGC00567. PRPSITE cross-reference(s) PS00662; T2SP_E 

A number of bacterial proteins, some of which are involved in a general secretion 
pathway (GSP) for the export of proteins (also called the type II pathway) [1,2], have been 
found to be evolutionary related. These proteins are listed below: 

-The *E' protein from the GSP operon of: Aeromonas (gene exeE); Erwinia (gene outE); 
Escherichia coli (gene yheG); Klebsiella pneumoniae (gene pulE); Pseudomonas aeruginosa 
(gene xcpR); Vibrio cholerae (gene epsE) and Xanthomonas campestris (gene xpsE). 
-Agrobacterium tumefaciens Ti plasmid virB operon protein 1 1. This protein is required for 
the transfer of T-DNA to plants. 
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-Bacillus subtilis comG operon protein 1 which is required for the uptake of DNA by 
competent Bacillus subtilis cells. 

-Aeromonas hydrophila tapB, involved in type IV pilus assembly. 
-Pseudomonas protein pilB, which is essential for the formation of the pili. 
-Pseudomonas aeruginosa protein twitching mobility protein pilT. 
-Neisseria gonorrhoeae type IV pilus assembly protein pilF. 
-Vibrio cholerae protein tcpT, which is involved in the biosynthesis of the 
tcp pilus. 

-Escherichia coli protein hofB (hopB). 
-Escherichia coli hypothetical protein ygcB. 
-Escherichia coli hypothetical protein yggR. 

These proteins have from 344 (pilT and virBl 1) to 568 (tapB) amino acids, they are 
probably cytoplasmically located and, on the basis of the presence of a conserved P-loop 
region (see <PDOC00017>), probably bind ATP. A region that overlaps the *B' motif of 
ATP-binding proteins was selected as a signature pattern. 

Consensus pattemfWMHLrVM SEP ID NO :4^-R-x(2)-P-D-x4ymyq fIJVM S EP ID 

N04a](3)-G-E-fM¥M4 ITJVM SEP fD NO:4\| -P-n 

Sequences known to belong to this class detected by the patternALL, except for ygcB. 

[l]Salmond G.P.C., Reeves P.J. Trends Biochem. Sci. 18:7-12(1993). 
[2]Hobbs M., Mattick J.S. Mol. Microbiol. 10:233-243(1993). 

955. (guanylate cyc) Guanylate cyclases signature 

PRPSITE: PDGC00425. PRPSITE cross-reference(s) PS00452; 

GUANYLATE_C YCLASES Guanylate cyclases (EC 4.6. 1 .2) [ 1 to 4] catalyze the 

formation of cyclic GMP (cGMP) from GTP. cGMP acts as an intracellular messenger, 
activating cGMP dependent kinases and regulating CGMP-sensitive ion channels. The role of 
cGMP as a second messenger in vascular smooth muscle relaxation and retinal photo- 
transduction is well established. Guanylate cyclase is found both in the soluble and particular 
fraction of eukaryotic cells. The soluble and plasma membrane-bound forms differ in 
structure, regulation and other properties. 
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Most currently known plasma membrane-bound forms are receptors for small 
polypeptides. The topology of such proteins is the following: they have a N-terminal 
extracellular domain which acts as the ligand binding region, then a transmembrane domain, 
followed by a large cytoplasmic C-terminal region that can be subdivided into two domains: a 
protein kinase-like domain that appears important for proper signalling and a cyclase catalytic 
domain. This topology is schematically represented below. 

+ XXXXX + + 

I Ligand-binding XXXXX Protein Kinase like | Cyclase | 

+ xxxxx + + 

Extracellular Transmembrane Cytoplasmic 

The known guanylate cyclase receptors are: 

-The sea-urchins receptors for speract and resact, which are small peptides that stimulate 
sperm motility and metabolism. 

-The receptors for natriuretic peptides (ANF). Two forms of ANF receptors with guanylate 
cyclase activity are currently known: GC-A (or ANP-A) which seems specific to atrial 
natriuretic peptide (ANP), and GC-B (or ANP-B) which seems to be stimulated more 
effectively by brain natriuretic peptide (BNP) than by ANP. 

-The receptor for Escherichia coli heat-stable enterotoxin (GC-C). The endogenous ligand 
for this intestinal receptor seems to be a small peptide called guanylin. 
-Retinal guanylate cyclase (retGC) which probably plays a specific functional role in the 
rods and/or cones of photoreceptors. It is not known if this protein acts as receptor, but its 
structure is similar to that of the other plasma membrane-bound GCs. 

The soluble forms of guanylate cyclase are cytoplasmic heterodimers. The two 
subunits, alpha and beta are proteins of from 70 to 82 Kd which are highly related. Two 
forms of beta subunits are currently known: beta-1 which seems to be expressed in lung and 
brain, and beta-2 which is more abundant in kidney and liver. 

The membrane and cytoplasmic forms of guanylate cyclase share a conserved domain 
which is probably important for the catalytic activity of the enzyme. Such a domain is also 
found twice in the different forms of membrane-bound adenylate cyclases (also known as 
class-Ill) [5,6] from mammals, slime mold or Drosophila. A consensus pattern was derived 
from the most conserved region in that domain. 
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Consensus P atternG-V-ffeJ¥M ti UVM SEP ID NO:4 V f-yfn l )-G-x(5)-[FY1-x-fWM tiLTVM 
^LymMl]4FYW]4GSl-fDMW^qj DN1:HKW SEP ID NQ:7 5()Yj-[nNT]-[TV]- 

Sequences known to belong to this class detected by the patternALL, except for the sea 

urchin Arbacia punctulata resact receptor which lack this domain. 

Note this pattern will detect both domains of adenylate cyclases class-Ill. 

[l]Koesling D., Boehme E., Schultz G. FASEB J. 5:2785-2791(1991). 
[2]Garbers D.L. New Biol. 2:499-504(1990). 
[3]Garbers D.L. Cell 71:1-4(1992). 

[4]Yuen P.S.T., Garbers D.L. Annu. Rev. Neurosci. 15:193-225(1992). 
[5]Iyengar R. FASEB J. 7:768-775(1993). 

[6]Barzu O., Danchin A. Prog. Nucleic Acid Res. Mol. Biol. 49:241-283(1994). 

956. Hemolysin-type calcium-binding region signature (HemolysinCabinD) 

Gram-negative bacteria produce a number of proteins which are secreted into the growth 
medium by a mechanism that does not require a cleaved N-terminal signal sequence. These 
proteins, while having different functions, seem [1] to share two properties: they bind 
calcium and they contain a variable number of tandem repeats consisting of a nine amino acid 
motif rich in glycine, aspartic acid and asparagine. It has been shown [2] that such a domain 
is involved in the binding of calcium ions in a parallel beta roll structure. The proteins which 
are currently known to belong to this category are: 

- Hemolysins from various species of bacteria. Bacterial hemolysins are exotoxins that attack 
blood cell membranes and cause cell rupture. The hemolysins which are known to contain 
such a domain are those from: E. coli (gene hlyA), A. pleuropneumoniae (gene appA), A. 
actinomycetemcomitans and P. haemolytica (leukotoxin) (gene lktA). 

- Cyclolysin from Bordetella pertussis (gene cyaA). A multifunctional protein which is both 
an adenylate cyclase and a hemolysin. 

- Extracellular zinc proteases: serralysin (EC 3.4.24.40) from Serratia, prtB and prtC from 
Erwinia chrysanthemi and aprA from Pseudomonas aeruginosa. 

- Nodulation protein nodO from Rhizobium leguminosarum. 
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A signature pattern was derived from conserved positions in the sequence of the calcium- 
binding domain. 

Consensus pattern D-x-[LI]-x(4)-G-x-D-x-[LI]-x-G-G-x(3)-D Sequences known to belong to 
this class detected by the pattern ALL. 

Note: This pattern is found once in nodO and the extracellular proteases but up to 5 times in 
some hemolysin/cyclolysins. 

[ 1] Economou A., Hamilton W.D.O., Johnston A.W.B., Downie J.A. EMBO J. 9:349- 
354(1990). 

[ 2] Baumann U., Wu S., Flaherty K.M., McKay D.B. EMBO J. 12:3357-3364(1993). 

957. Hint module (Hint) 

This is an alignment of the Hint module in the Hedgehog proteins. It does not include any 
Inteins which also possess the Hint module. 
Number of members: 36 

[1] Hall TM, Porter JA, Young KE, Koonin EV, Beachy PA, Leahy DJ; Medline: 97474313 
"Crystal structure of a Hedgehog autoprocessing domain: homology between Hedgehog and 
self-splicing proteins." Cell 1997;91:85-97. 

958. Hydantoinase/oxoprolinase (Hydantoinase) 

This family includes the enzymes hydantoinase and oxoprolinase EC.3.5.2.9. Both reactions 
involve the hydrolysis of 5-membered rings via hydrolysis of their internal imide bonds [1]. 
Number of members: 14 

[1] Ye GJ, Breslow EB, Meister A, Guo-jie GE$[corrected to Ye GJ]; Medline: 971 13037 
"The amino acid sequence of rat kidney 5-oxo-L-prolinase determined by cDNA cloning" 
[published erratum appears in J Biol Chem 1997 Feb 14;272(7):4646] J Biol Chem 
1996;271:32293-32300. 
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959. IMP dehydrogenase / GMP reductase signature (IMPDH_N) 

IMP dehydrogenase (EC 1.1.1.205) (IMPDH) catalyzes the rate-limiting reaction of de novo 
GTP biosynthesis, the NAD-dependent reduction of IMP into XMP [1]. Inhibition of IMP 
dehydrogenase activity results in the cessation of DNA synthesis. As IMP dehydrogenase is 
associated with cell proliferation, it is a possible target for cancer chemotherapy. Mammalian 
and bacterial IMPDHs are tetramers of identical chains. There are two IMP dehydrogenase 
isozymes in humans [2]. 

GMP reductase (EC 1.6.6.8) catalyzes the irreversible and NADPH-dependent reductive 
deamination of GMP into IMP [3]. It converts nucleobase, nucleoside and nucleotide 
derivatives of G to A nucleotides, and maintains intracellular balance of A and G nucleotides. 

IMP dehydrogenase and GMP reductase share many regions of sequence similarity. One of 
these regions is centered on a cysteine residue thought [3] to be involved in binding IMP. 
This region was used as a signature pattern. 

Consensus pattem{MVM4 [LfyM SEP ID NO:4^} -rRKW¥M #LIVM SEP ID NO :4)]-n. 
ft^M]{UVM SEfi.lD.]^:4fl [C is the 

putative IMP-binding residue] Sequences known to belong to this class detected by the 
pattern ALL. 



[ 1] Collart F.R., Huberman E. J. Biol. Chem. 263:15769-15772(1988). 

[ 2] Natsumeda Y., Phno S., Kawasaki H., Konno Y., Weber G., Suzuki K. J. Biol. Chem. 

265:5292-5295(1990). 

[ 3] Andrews S.C., Guest J.R. Biochem. J. 255:35-43(1988). 



960. impB/mucB/samB family (IMS) 



These proteins are involved in UV protection (Swiss). 
Number of members: 38 



Reference No. 2750-942P 



798 

961. Type II intron maturase (Intron_maturas2) 

Group II introns use intron-encoded reverse transcriptase, maturase and DNA endonuclease 
activities for site-specific insertion into DNA [2]. Although this type of intron is self splicing 
in vitro they require a maturase protein for 

splicing in vivo. It has been shown that a specific region of the aI2 intron is needed for the 
maturase function [1]. This region was found to be conserved in group II introns and called 
domain X [3]. 

Number of members: 335 

[1] Moran JV, Mecklenburg KL, Sass P, Belcher SM, Mahnke D, Lewin A, Perlman P; 
Medline: 94301788 "Splicing defective mutants of the COXI gene of yeast mitochondrial 
DNA: initial definition of the maturase domain of the group II intron aI2. Nucleic Acids Res 
1994;22:2057-2064. 

[2] Guo H, Zimmerly S, Perlman PS, Lambowitz AM; Medline: 98031910 "Group II intron 
endonucleases use both RNA and protein subunits for recognition of specific sequences in 
double-stranded DNA." EMBO J 1997;16:6835-6848. 

[3] Mohr G, Perlman PS, Lambowitz AM; Medline: 94077696 "Evolutionary relationships 
among group II intron-encoded proteins and identification of a conserved domain that may be 
related to maturase function." Nucleic Acids Res 1 993 ;2 1 :499 1-4997. 

962. LAGLIDADG endonuclease (Intronmaturase) 

[1] Heath PJ, Stephens KM, Monnat RJ Jr, Stoddard BL; Medline: 97331323 "The structure 
of I-Crel, a group I intron-encoded homing endonuclease." Nat Struct Biol 1997;4:468-476. 
[2] Belfort M, Roberts RJ; Medline: 97402526 "Homing endonucleases: keeping the house in 
order." Nucleic Acids Res 1997;25:3379-3388. 

[3] Dalgaard JZ, Klar AJ, Moser MJ, Holley WR, Chatterjee A, Mian IS; Medline: 98026854 
"Statistical modeling and analysis of the LAGLIDADG family of site-specific endonucleases 
and identification of an intein that encodes a site-specific endonuclease of the HNH family." 
Nucleic Acids Res 1997;25:4626-4638. 



Number of members: 220 
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963. Isopentenyl transferase (IPT) 

Isopentenyl transferase / dimethylallyl transferase synthesizes isopentenyladensosine 5'- 
monophosphate, a cytokinin that induces shoot formation on host plants infected with the Ti 
plasmid [1]. 

Number of members: 1 6 

[1] Canaday J, Gerad JC, Crouzet P, Otten L; Medline: 93 101 133 "Organization and 
functional analysis of three T-DNAs from the vitopine Ti plasmid P TiS4." Mol Gen Genet 
1992;235:292-303. 

964. Laminin EGF-like (Domains III and V) (laminin_EGF) 

This family is like EGF but has 8 conserved cysteines instead of 6. 
Number of members: 501 

[1] Engel J; Medline: 93041759 "Laminins and other strange proteins." Biochemistry 
1992;31:10643-10651. 

965. Legume lectins signatures (lectin legA) 

Leguminous plants synthesize sugar-binding proteins which are called legume lectins [1,2]. 
These lectins are generally found in the seeds. The exact function of legume lectins is not 
known but they may be involved in the attachment of nitrogen-fixing bacteria to legumes and 
in the protection against pathogens. Legume lectins bind calcium and manganese (or other 
transition metals). 

Legume lectins are synthesized as precursor proteins of about 230 to 260 amino acid 
residues. Some legume lectins are proteolytically processed to produce two chains: beta 
(which corresponds to the N-terminal) and alpha (C-terminal). The lectin concanavalin A 
(conA) from jack bean is exceptional in that the two chains are transposed and ligated (by 
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formation of a new peptide bond). The N-terminus of mature conA thus corresponds to that 
of the alpha chain and the C-terminus to the beta chain. 

Two signature patterns were developed specific to legume lectins: the first is located in the C- 
terminal section of the beta chain and contains a conserved aspartic acid residue important for 
the binding of calcium and manganese; the second one is located in the N-terminal of the 
alpha chain. 

Consensus pattern [LIV1-fg£Ae-? rSTAG S EP IP NO:2Q)]-V-ffia^rnpny SEO jp_ 
NO:358)j -[FLI]-D-[ST] [D binds manganese and calcium] Sequences known to belong to 
this class detected by the pattern ALL. 

Consensus pattern [LIV]-x-[EDQ]-ff¥WKRf rFYWKB. SEO ID NO^Qfl .v.v^oizqrT p. T 
SEOJ B NO: i 27 )1-G-[LF]-[ST] Sequences known to belong to this class detected by the 
pattern ALL. 

[ 1] Sharon N., Lis H. FASEB J. 4:3198-320(1990). 

[ 2] Lis H., Sharon N. Annu. Rev. Biochem. 55:33-37(1986). 

966. Malate synthase signature (malate_synthase) 

Malate synthase (EC 4.1.3.2) catalyzes the aldol condensation of glyoxylate with acetyl-CoA 
to form malate - the second step of the glyoxylate bypass, an alternative to the tricarboxylic 
acid cycle in bacteria, fungi and plants. Malate synthase is a protein of 530 to 570 amino 
acids whose sequence is highly conserved across species [1]. As a signature pattern, a very 
conserved region was selected in the central section of the enzyme. 

Consensus pattem[KR]-{M^DENQ..S EQ IP NO:371 )]-H- x (2)-fi-T-M-vr T - x - W - D _Y- 
fWM ttU VM SEP IP NO:4 ) ]-F Sequences known to belong to this class detected by the 
pattern ALL. 



[ 1] Bruinenberg P.G., Blaauw M., Kazemier B., Ab G. Yeast 6:245-254(1990). 
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967. MatK/TrnK amino terminal region (MatKJSf) 

[1] Mohr G, Perlman PS, Lambowitz AM; Medline: 94077696 "Evolutionary relationships 
among group II intron-encoded proteins and identification of a conserved domain that may be 
5 related to maturase function." Nucleic Acids Res 1 993 ;2 1 :499 1 -4997. 

Number of members: 495 

968. MOZ/SAS family (MOZ_SAS) 

10 

This region of these proteins has been suggested to be homologous to acetyltransferases [1]. 
However the similarity is not supported by standard sequence analysis. 
Number of members: 1 5 

15 -[1] Kamine J, Elangovan B 5 Subramanian T 5 Coleman D, Chinnadurai G; Medline: 96182937 
"Identification of a cellular protein that specifically interacts with the essential cysteine 
region of the HIV-1 Tat transactivator." Virology 1996;216:357-366. 
[2] Reifsnyder C, Lowell J, Clarke A, Pillus L; Medline: 96376969 "Yeast SAS silencing 
genes and human genes associated with AML and HIV-1 Tat interactions are homologous 

20 with acetyltransferases" [see comments] [published erratum appears in Nat Genet 1997 
May;16(l):109] Nat Genet 1996;14:42-49. 

969. mRNA capping enzyme (mRNA_cap_enzyme) 

2 5 [1] Hakansson K, Doherty AJ 5 Shuman S, Wigley DB; Medline: 97304383 "X-ray 

crystallography reveals a large conformational change during guanyl transfer by mRNA 
capping enzymes." Cell 1997;89:545-553. 

Number of members: 7 

30 

970. DNA mismatch repair proteins mutS family signature (MutS_C) 
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Mismatch repair contributes to the overall fidelity of DNA replication [1]. It involves the 
correction of mismatched base pairs that have been missed by the proofreading element of the 
DNA polymerase complex. The sequence of some proteins involved in mismatch repair in 
different organisms have been found to be evolutionary related [2,3]. One of these families is 
called mutS [4,E1], it consists of: 

- Prokaroytic protein mutS protein (also called hexA in Streptococcus pneumoniae). Muts is 
thought to carry out the mismatch recognition step of DNA repair. 

- Eukaryotic MSH1, which is involved in mitochondrial DNA repair. 

- Eukaryotic MSH2, which is involved in nuclear postreplication mismatch repair. MSH2 
heterodimerizes with MSH6. In man, MSH2 is involved in a form of familial hereditary 
nonpolyposis colon cancer (HNPCC). 

- Eukaryotic MSH3, which is probably involved in the repair of large loops. 

- Eukaryotic MSH4, which is involved in meiotic recombination. 

- Eukaryotic MSH5, which is involved in meiotic recombination. 

- Eukaryotic MSH6 (also known as G/T mismatch binding protein), a DNA-repair protein 
that binds to G/T mismatches through heterodimerization with MSH2. 

- Prokaryotic protein mutS2 whose function is not yet known. 

- A coral (Sarcophyton glaucum) mitochondrial encoded mutS-like protein. 

As a signature pattern for this class of mismatch repair proteins a region rich in glycine and 
negatively charged residues was selected This region is found 

in the C-terminal section of these proteins; about 80 residues to the C-terminal of an ATP- 
binding site motif 'A' (P-loop) (see <PDOC00017>). 

Consensus pattem[STHfcA%l^^ 

x-D-E-[LP^4F¥^^ Sequences 
known to belong to this class detected by the pattern ALL, except for mutS2. 

[ 1] Modrich P. Annu. Rev. Biochem. 56:435-466(1987). 

[ 2] Haber L.T., Walker G.C EMBO J. 10:2707-2715(1991). 

[ 3] New L., Liu K., Crouse G.F. Mol. Gen. Genet. 239:97-108(1993). 

[ 4] Eisen J.A. Nucleic Acids Res. 26:4291-4300(1998). 



971. MutS family, N-terminal putative DNA binding domain (MutS_N) 
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This family consists of the N-terminal region of proteins in the mutS family of DNA 
mismatch repair proteins and is found associated with MutS_C located in the C-terminal 
region. The mutS family of proteins is named after the salmonella typhimurium MutS protein 
5 involved in mismatch repair; other members of the family included the eukaryotic MSH 

1,2,3,4,5 and 6 proteins. These have various roles in DNA repair and recombination. Human 
MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a 
mismatch binding protein [2]. The aligned region corresponds in part with domains Al, A2 
(which may bind DNA) and B (which binds dsDNA in vitro) from T. thermophilus MutS as 
1 0 characterised in [ 1 ] . 

Number of members: 43 

972. Domain in Myosin and Kinesin Tails (MyTH4) 

1 5 Domain present twice in myosin-VIIa, and also present in 3 other myosins. 

[1] Chen ZY, Hasson T, Kelley PM, Schwender BJ, Schwartz MF, Ramakrishnan M 5 
Kimberling WJ, Mooseker MS, Corey DP; Medline: 97038686 "Molecular cloning and 
domain structure of human myosin-VIIa, the gene product defective in Usher syndrome IB." 

2 0 Genomics 1996;36:440-448. 

Number of members : 2 1 

973. Sodium and potassium ATPases beta subunits signatures (Na_K-ATPase) 

25 

The sodium pump (Na+,K+ ATPase), located in the plasma membrane of all animal cells [1], 
is an heterotrimer of a catalytic subunit (alpha chain), a glycoprotein subunit of about 34 Kd 
(beta chain) and a small hydrophobic protein of about 6 Kd. The beta subunit seems [2] to 
regulate, through the assembly of alpha/beta heterodimers, the number of sodium pumps 

3 0 transported to the plasma membrane. 

Structurally the beta subunit is composed of a charged cytoplasmic domain of about 35 
residues, followed by a transmembrane region, and a large extracellular domain that contains 
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three disulfide bonds and glycosylation sites. This structure is schematically represented in 
the figure below. 

+....++..++ +n mi 

xxxxxxxxxxxxxxxxxxxxxxxxCxxxxCxCxxCxxxxxxxCxxxxxxxxxxxCxxxx 
5 ******** <-Cyt-><TM>< Extracellular > 

'C: conserved cysteine involved in a disulfide bond. 
'*': position of the patterns. 

10 Two isoforms of the beta subunit (beta-1 and beta-2) are currently known; they share about 
50% sequence identity. Gastric (K+ ? H+) ATPase (proton pump) responsible for acid 
production in the stomach consist of two subunits [3]; the beta chain is highly similar to the 
sodium pump beta subunits. Two signature patterns were developed for beta subunits. The 
first is located in the cytoplasmic domain, while the second is found in the extracellular 

1 5 domain and contains two of the cysteines involved in disulfide bonds. 

Consensus pattern [FYW]-x(2HFYW]-x-[FYWH^ S EP ID NO : 4)]- 

G-R-T-x(3)-W Sequences known to belong to this class detected by the pattern ALL. 

2 0 Consensus pattern [RK]-x(2)-C4^WqfRKQWI SEQ ID NO:752)1-x(5VL-x(2VC-[SA]-G 
[The two C's are involved in disulfide bonds] Sequences known to belong to this class 
detected by the patternALL, except for the beta subunit of the sodium pump of brine shrimp 
whose sequence is highly divergent in that region. 

2 5 [1] Horisberger J.D., Lemas V. ? Krahenbul J. P., Rossier B.C. Annu. Rev. Physiol. 53:565- 
584(1991). 

[ 2] McDonough A. A., Gerring K. 5 Farley R.A. FASEB J. 4:1598-1605(1990). 
[ 3] Toh B.-H. 5 Gleeson P.A., Simpson R.J., Moritz R.L., Callaghan J.M., Goldkorn I., Jones 
C.M. 5 Martinelli T.M., Mu F.-T., Humphris D.C., Pettitt J.M., Mori Y. ? Masuda T. ? 
30 Sobieszczuk P., Weinstock J. ? Mantamadiotis T., Baldwin G.S. Proc. Natl. Acad. Sci. U.S.A. 
87:6418-6422(1990). 

974. Respiratory-chain NADH dehydrogenase subunit 1 signatures (NADHdh) 
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Respiratory-chain NADH dehydrogenase (EC 1.6.53) [1,2] (also known as complex I or 
NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the inner 
mitochondrial membrane which also seems to exist in the chloroplast and in cyanobacteria 
5 (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this 
bioenergetic enzyme complex there are fifteen which are located in the membrane part, seven 
of which are encoded by the mitochondrial and chloroplast genomes of most species. The 
most conserved of these organelle-encoded subunits is known as subunit 1 (gene ND1 in 
mitochondrion, and NDH1 in chloroplast) and seems to contain the ubiquinone binding site. 

10 

The ND1 subunit is highly similar to subunit 4 of Escherichia coli formate hydrogenlyase 
(gene hycD), subunit C of hydrogenase-4 (gene hyfC). Paracoccus denitrificans NQ08 and 
Escherichia coli nuoH NADH-ubiquinone oxidoreductase subunits also belong to this family 
[3]. Two signature patterns were developed based on conserved regions of this subunit. 

15 

Consensus pattern G-fLP^F¥K^)£LW 

D - feUM&fl f ' A G I M SEP ID N 0 : 7 54)1 - [ LI. V M.FT A ] [T.J VMFT A SEP ID NO:386)| - K- 
^M^qgiXVMYST SEP ID NO: 755Y14L^MF¥GI[ L1VMFYG SBC I D M P : 1 68 )]-x- 
2 0 [KR]-[EQG] Sequences known to belong to this class detected by the patternALL, except for 
watermelon and Leishmania ND1. 

Consensus pattern P>F>D-P^VM¥¥P)iI ,IVMF YO S E P TP NO:1 8 8V|- 

2 5 NO:429)j-x(2)-G Sequences known to belong to this class detected by the pattern ALL 5 
except for Chlamydomonas reinhardtii and Pisaster ochraceus ND1, and tobacco NDH1. 

[ 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T., Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991). 
30 [3] Weidner U., Geier S. ? Ptock A., Friedrich T., Leif H., Weiss H. J. Mol. Biol. 233:109- 
122(1993). 

975. Nickel-dependent hydrogenases large subunit signatures (NiFeSe_Hases) 
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Hydrogenases are enzymes that catalyze the reversible activation of hydrogen and which 
occur widely in prokaryotes as well as in some eukaryotes. There are various types of 
hydrogenases, but all of them seem to contain at least one iron-sulfur cluster. They can be 
broadly divided into two groups: hydrogenases containing nickel and, in some cases, also 
selenium (the [NiFe] and [NiFeSe] hydrogenases) and those lacking nickel (the [Fe] 
hydrogenases). 

The [NiFe] and [NiFeSe] hydrogenases are heterodimer that consist of a small subunit that 
contains a signal peptide and a large subunit. All the known large subunits seem to be 
evolutionary related [1]; they contain two Cys-x-x- Cys motifs; one at their N-terminal end; 
the other at their C-terminal end. These four cysteines are involved in the binding of nickel 
[2]. In the [NiFeSe] hydrogenases the first cysteine of the C-terminal motif is a 
selenocysteine which has experimentally been shown to be a nickel ligand [3]. Two patterns 
were developed which are centered on the Cys-x-x-Cys motifs. 

Alcaligenes eutrophus possess a NAD-reducing cytoplasmic hydrogenase (hoxS) [4]; this 
enzyme is composed of four subunits. Two of these subunits (beta and delta) are responsible 
for the hydrogenase reaction and are evolutionary related to the large and small subunits of 
membrane-bound hydrogenases. The alpha subunit of coenzyme F420 hydrogenase (EC 
1.12.99.1) (FRH) from archaebacterial methanogens also belongs to this family. 

Consensus pattern R-G-ff vtVMI^f L I V MP SEP ID NO:2 );j-E>x(15)4QESMjr OESM SEP ID 
NQ:757)]-R-x-C-G-}^P^ [The two C's are nickel ligands] 

Sequences known to belong to this class detected by the pattern ALL. 

Consensus pattern [FY]-D-P-C-[LIM]-[ASG]-C-x(2 5 3)-H [The two C's are nickel ligands] 
Sequences known to belong to this class detected by the pattern ALL. 

[ 1] Menon N.K., Robbins J. 5 Peck H.D. Jr., Chatelus C.Y., Choi E.-S., Przybyla A.E. J. 
Bacteriol. 172:1969-1977(1990). 

[ 2] Volbeda A., Charon M.-H., Piras C, Hatchikian E.C., Frey M., Fontecilla-Camps J.C. 
Nature 373:580-587(1995). 



Reference No. 2750-942P 



807 

[ 3] Eidsness M.K., Scott R.A., Prickrill B., der Vartaninan D.V., LeGall J., Moura I., Moura 

J.J.G., Peck H.D. Jr. Proc. Natl. Acad. Sci. U.S.A. 86:147-151(1989). 

[ 4] Tran-Betcke A., Waraecke U., Boecker C, Zaborosch C 5 Friedrich B. J. Bacterid. 

172:2920-2929(1990). 

5 

976. NADH-Ubiquinone oxidoreductase (complex I), chain 5 C-terminus (oxidoredqlC) 

This sub-family represents a carboxyl terminal extension of oxidored_ql. Only NADH- 
Ubiquinone chain 5 from chloroplasts are in this family. This sub-family is part of complex I 
10 which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction that is 
associated with proton translocation across the membrane. 
Number of members: 572 

[1] Walker JE; Medline: 931 10040 "The NADH ubiquinone oxidoreductase (complex I) of 
1 5 respiratory chains." Q Rev Biophys 1992;25:253-324. 

977. NADH-Ubiquinone oxidoreductase (complex I) 5 chain 5 N-terminus (oxidored ql N) 

This sub-family represents an amino terminal extension of oxidored_ql. Only NADH- 
2 0 Ubiquinone chain 5 and eubacterial chain L are in this family. This sub-family is part of 
complex I which catalyses the transfer of two electrons from NADH to ubiquinone in a 
reaction that is associated with proton translocation across the membrane. 
Number of members: 546 

2 5 [1] Walker JE; Medline: 93 1 10040 "The NADH ubiquinone oxidoreductase (complex I) of 

respiratory chains." Q Rev Biophys 1992;25:253-324. 

978. oxidored_q2. NADH-UBIQUINONE OXIDOREDUCTASE CHAIN 4L (EC 1.6.5.3). 
ND4L OR NAD4L. Arabidopsis thaliana (Mouse-ear cress). Mitochondrion. OC Eukaryota; 

3 0 Viridiplantae; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; 

Rosidae; eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

CATALYTIC ACTIVITY: NADH + UBIQUINONE = NAD(+) + UBIQUINOL. 
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[1] SEQUENCE FROM N.A. MEDLINE; 93156682. Brandt P., Sunkel S., Unseld M., 
Brennicke A., Knoop V.; "The nad4L gene is encoded between exon c of nad5 and orf25 in 
the Arabidopsis mitochondrial genome."; Mol. Gen. Genet. 236:33-38(1992). 
[2] SEQUENCE FROM N.A. STRAIN=CV. COLUMBIA; MEDLINE; 97141919 Unseld 
M., Marienfeld J.R., Brandt P., Brennicke A.; "The mitochondrial genome of Arabidopsis 
thaliana contains 57 genes in 366,924 nucleotides."; Nat. Genet. 15:57-61(1997). 

979. oxidored_q4. Protein name NADH-PLASTOQUINONE OXIDOREDUCTASE CHAIN 
3, CHLOROPLAST. Synonym(s)EC 1.6.5.3. Gene name(s)NDHC ORNDH3 From Zea 
mays (Maize) Encoded on Chloroplast. Taxonomy Eukaryota; Viridiplantae; Embryophyta; 
Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; Zea. 

CATALYTIC ACTIVITY: NADH + PLASTOQUINONE = NAD(+) + 
PLASTOQUINOL. 

SIMILARITY: BELONGS TO THE COMPLEX I SUBUNIT 3 FAMILY. 

[1] SEQUENCE FROM N.A. MEDLINE; 89281491. Steinmueller K., Ley A.C., Steinmetz 
A.A., Sayre R.T., Bogorad L.; "Characterization of the ndhC-psbG-ORFl 57/159 operon of 
maize plastid DNA and of the cyanobacterium Synechocystis sp. PCC6803."; Mol. Gen. 
Genet. 216:60-69(1989). 

[2] SEQUENCE FROM N.A. MEDLINE; 95395841. Maier R.M., Neckermann K., Igloi 
G.L., Koessel H.; "Complete sequence of the maize chloroplast genome: gene content, 
hotspots of divergence and fine tuning of genetic information by transcript editing."; J. Mol. 
Biol. 251:614-628(1995). 

980. PAC: PAC motif 

PAC motif occurs C-terminal to a subset of all known PAS motifs. It is proposed to 
contribute to the PAS domain fold [3]. Number of members: 181 

[1] Medline: 97446881 PAS domain S-boxes in archaea, bacteria and sensors for oxygen and 
redox. Zhulin IB, Taylor BL, Dixon R; Trends Biochem Sci 1997;22:331-333. 
[2] Medline: 95275818. 1 .4 A structure of photoactive yellow protein, a cytosolic 
photoreceptor: unusual fold, active site, and chromophore. Borgstahl GE, Williams DR, 
Getzoff ED; Biochemistry 1995;34:6278-6287. 
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[3] Medline: 98044337. PAS: a multifunctional domain family comes to light. Ponting CP, 
Aravind L; Curr Biol 1997;7:674-677. 

981. PARP: Poly(ADP-ribose) polymerase catalytic region. 

Poly(ADP-ribose) polymerase catalyses the covalent attachment of ADP-ribose units from 
NAD+ to itself and to a limited number of other DNA binding proteins, which decreases their 
affinity for DNA. Poly(ADP-ribose) polymerase is a regulatory component induced by DNA 
damage. 

The carboxyl-terminal region is the most highly conserved region of the protein. Experiments 
have shown that a carboxyl 40 kDa fragment is still catalytically active [2]. Number of 
members: 19 

[1] Medline: 96353841 Structure of the catalytic fragment of poly(AD-ribose) polymerase 
from chicken. Ruf A, Mennissier de Murcia J, de Murcia G, Schulz GE; Proc Natl Acad Sci 
USA 1996;93:7481-7485. 

[2] Medline: 93293867 The carboxyl-terminal domain of human poly(ADP-ribose) 
polymerase. Overproduction in Escherichia coli, large scale purification, and 
characterization. Simonin F, Hofferer L, Panzeter PL, Muller S, de Murcia G, Althaus FR; J 
Biol Chem 1993;268:13454-13461. 

982. PC_rep: Proteasome/cyclosome repeat 

[1] Medline: 97348748 A repetitive sequence in subunits of the 26S proteasome and 20S 
cyclosome (anaphase-promoting complex). Lupas A, Baumeister W, Hofmann K; Trends 
Biochem Sci 1997;22:195-196. 
Number of members: 112 

983. PeptidaseJVIl: Peptidase family Ml 

Members of this family are aminopeptidases. The members differ widely in specificity, 
hydrolysing acidic, basic or neutral N-terminal residues. This family includes leukotriene-A4 
hydrolase Swiss:P09960, this enzyme also has an aminopeptidase activity [1]. Number of 
members: 72 
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[1] Medline: 95405261 Evolutionary families of metallopeptidases. RawlingsND, Barrett AJ; 
Meth Enzymol 1995;248:183-228. 

984. Neutral zinc metallopeptidases, zinc-binding region signature (Peptidase_M8) 
PROSITE cross-reference(s) PS00142; ZINC_PROTEASE 

The majority of zinc-dependent metallopeptidases (with the notable exception of the 
carboxypeptidases) share a common pattern of primary structure [1,2,3] in the part of their 
sequence involved in the binding of zinc, and can be grouped together as a 
superfamily,known as the metzincins, on the basis of this sequence similarity. They can be 
classified into a number of distinct families [4,E1] which are listed below along with the 
proteases which are currently known to belong to these families. 
Family Ml 

- Bacterial aminopeptidase N (EC 3.4.1 1.2) (gene pepN). 

- Mammalian aminopeptidase N (EC 3.4.1 1.2). 

- Mammalian glutamyl aminopeptidase (EC 3.4.1 1 .7) (aminopeptidase A). It may play a 
role in regulating growth and differentiation of early B-lineage cells. 

- Yeast aminopeptidase yscll (gene APE2). 

- Yeast alanine/arginine aminopeptidase (gene AAP1). 

- Yeast hypothetical protein YIL137c. 

- Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is responsible for the hydrolysis of 
an epoxide moiety of LTA-4 to form LTB-4; it has been shown that it binds zinc and is 
capable of peptidase activity. 

Family M2 

- Angiotensin-converting enzyme (EC 3.4.15.1) (dipeptidyl carboxypeptidase I) (ACE) the 
enzyme responsible for hydrolyzing angiotensin I to angiotensin II. There are two forms 
of ACE: a testis-specific isozyme and a somatic isozyme which has two active centers. 
Family M3 

- Thimet oligopeptidase (EC 3.4.24.15), a mammalian enzyme involved in the cytoplasmic 
degradation of small peptides. 

- Neurolysin (EC 3.4.24.16) (also known as mitochondrial oligopeptidase M or microsomal 
endopeptidase). 
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- Mitochondrial intermediate peptidase precursor (EC 3.4.24.59) (MIP). It is involved the 
second stage of processing of some proteins imported in the mitochondrion. 

- Yeast saccharolysin (EC 3.4.24.37) (proteinase yscD). 

-Escherichia coli and related bacteria dipeptidyl carboxypeptidase (EC 3.4.15.5) (gene 
5 dcp). 

- Escherichia coli and related bacteria oligopeptidase A (EC 3.4.24.70) (gene opdA or prlC). 

- Yeast hypothetical protein YKL134c. 
Family M4 

- Thermostable thermolysins (EC 3.4.24.27), and related thermolabile neutral proteases 
1 0 (bacillolysins) (EC 3.4.24.28) from various species of Bacillus. 

- Pseudolysin (EC 3.4.24.26) from Pseudomonas aeruginosa (gene lasB). 

- Extracellular elastase from Staphylococcus epidermidis. 

- Extracellular protease prtl from Erwinia carotovora. 

- Extracellular minor protease smp from Serratia marcescens. 
15 - Vibriolysin (EC 3.4.24.25) from various species of Vibrio. 

- Protease prtA from Listeria monocytogenes. 

- Extracellular proteinase proA from Legionella pneumophila. 

Family M5 

2 0 - Mycolysin (EC 3 .4.24.3 1) from Streptomyces cacaoi. 
Family M6 

- Immune inhibitor A from Bacillus thuringiensis (gene ina). Ina degrades two classes of 
insect antibacterial proteins, attacins and cecropins. 

25 

Family M7 

- Streptomyces extracellular small neutral proteases 
Family M8 

30 - Leishmanolysin (EC 3.4.24.36) (surface glycoprotein gp63), a cell surface protease from 
various species of Leishmania. 



Family M9 
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- Microbial collagenase (EC 3.4.24.3) from Clostridium perfringens and Vibrio 
alginolyticus. 

Family Ml OA 

5 - Serralysin (EC 3.4.24.40), an extracellular metalloprotease from Serratia. 

- Alkaline metalloproteinase from Pseudomonas aeruginosa (gene aprA). 

- Secreted proteases A, B, C and G from Erwinia chrysanthemi. 

- Yeast hypothetical protein YIL108w. 

10 Family Ml OB 

- Mammalian extracellular matrix metalloproteinases (known as matrixins) [5]: MMP-1 (EC 
3.4.24.7) (interstitial collagenase), MMP-2 (EC 3.4.24.24) (72 Kd gelatinase), MMP-9 (EC 
3.4.24.35) (92 Kd gelatinase), MMP-7 (EC 3.4.24.23) (matrylisin), MMP-8 (EC 3.4.24.34) 
(neutrophil collagenase), MMP-3 (EC 3.4.24.17) (stromelysin-1), MMP-10 (EC 3.4.24.22) 

15 (stromelysin-2), and MMP-1 1 (stromelysin-3), MMP-12 (EC 3.4.24.65) (macrophage 
metalloelastase). 

- Sea urchin hatching enzyme (envelysin) (EC 3.4.24.12). A proteas that allows the 
embryo to digest the protective envelope derived from the egg extracellular matrix. 

- Soybean metalloendoproteinase 1 . 

20 

Family Mil 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE). 
Family Ml 2 A 

2 5 - Astacin (EC 3.4.24.21), a crayfish endoprotease. 

- Meprin A (EC 3.4.24.18), a mammalian kidney and intestinal brush border 
metalloendopeptidase. 

- Bone morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone 
formation and which expresses metalloendopeptidase activity. The Drosophila homolog 

30 of BMP-1 is the dorsal-ventral patterning protein tolloid. 

-Blastula protease 10 (BP 10) from Paracentrotus lividus and the related protein SpAN 
from Strongylocentrotus purpuratus. 

- Caenorhabditis elegans protein toh-2. 
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- Caenorhabditis elegans hypothetical protein F42A10.8. 

- Choriolysins L and H (EC 3 A. 24. 67) (also known as embryonic hatching proteins LCE 
and HCE) from the fish Oryzias lapides. These proteases participates in the breakdown 
of the egg envelope, which is derived from the egg extracellular matrix, at the time of 
hatching. 

Family Ml 2B 

- Snake venom metalloproteinases [6]. This subfamily mostly groups proteases that act in 
hemorrhage. Examples are: adamalysin II (EC 3.4.24.46), atrolysin C/D (EC 
3.4.24.42), atrolysin E (EC 3.4.24.44), fibrolase (EC 3.4.24.72), trimerelysin I (EC 
3.4.25.52) and II (EC 3.4.25.53). 

- Mouse cell surface antigen MS2. 

Family Ml 3 

- Mammalian neprilysin (EC 3.4.24.1 1) (neutral endopeptidase) (NEP). 

- Endothelin-converting enzyme 1 (EC 3.4.24.71) (ECE-1), which process the precursor of 
endothelin to release the active peptide. 

- Kell blood group glycoprotein, a major antigenic protein of erythrocytes. The Kell protein 
is very probably a zinc endopeptidase. 

- Peptidase O from Lactococcus lactis (gene pepO). 

Family M27 

- Clostridial neurotoxins, including tetanus toxin (TeTx) and the various botulinum toxins 
(BoNT). These toxins are zinc proteases that block neurotransmitter release by 
proteolytic cleavage of synaptic proteins such as synaptobrevins, syntaxin and SNAP-25 
[7,8]. 

Family M30 

- Staphylococcus hyicus neutral metalloprotease. 
Family M32 

- Thermostable carboxypeptidase 1 (EC 3.4.17.19) (carboxypeptidase Taq), an enzyme 
from Thermus aquaticus which is most active at high temperature. 
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Family M34 

- Lethal factor (LF) from Bacillus anthracis, one of the three proteins composing the 
anthrax toxin. 

5 

Family M35 

- Deuterolysin (EC 3.4.24.39) from Penicillium citrinum and related proteases from various 
species of Aspergillus. 

10 Family M36 

- Extracellular etastinolytic metalloproteinases from Aspergillus. 

From the tertiary structure of thermolysin, the position of the residues acting as zinc 
ligands and those involved in the catalytic activity are known. Two of the zinc ligands are 
15 histidines which are very close together in the sequence; C -terminal to the first histidine is 
a glutamic acid residue which acts as a nucleophile and promotes the attack of a water 
molecule on the carbonyl carbon of the substrate. A signature pattern which includes the 
two histidine and the glutamic acid residues is sufficient to detect this superfamily of 
proteins. 

20 

Consensus pattem[GSTALIVNj[G^ 

^VMF¥W4[ LIVMFYW SEP ID NO:26)]-4QEt4f«^{ DEHRKP SEP ID NQ:6 80)|-H-x- 
^PJMFYWGSP&ftU VM FY W GSPQ SEQ ID NO;68 lj] 
[The two FTs are zinc ligands] [E is the active site residue] 
2 5 Sequences known to belong to this class detected by the patternALL, except 

for members of families M5 ? M7 amd Mil. 

Other sequence(s) detected in SWISS-PROT57; including Neurospora crassa 

conidiation-specific protein 13 which could be a zinc-protease. 
[l]Jongeneel C.V., Bouvier J., Bairoch A. FEBS Lett. 242:21 1-214(1989). 
30 [2]Murphy G.J.P., Murphy G., Reynolds J.J. FEBS Lett. 289:4-7(1991). 

[3]Bode W., Grams F., Reinemer P. ? Gomis-Rueth F.-X., Baumann U., McKay D.B., 
Stoecker W. Zoology 99:237-246(1996). 

[4]Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 
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[5]Woessner J. Jr. FASEB J. 5:2145-2154(1991). 

[6]Hite L.A., Fox J.W., Bjarnason J.B. Biol. Chem. Hoppe-Seyler 373:381-385(1992). 
[7]Montecucco C. 5 Schiavo G. Trends Biochem. Sci. 18:324-327(1993). 
[8]Niemann H., Blasi J., Jahn R. Trends Cell Biol. 4:179-185(1994). 

5 

985. PH04: Phosphate transporter family 

This family includes PHO-4 from Neurospora crassa which is a is a Na(+)-phosphate 
symporter [1], This family also contains the leukemia virus receptor Swiss:Q08344. Number 
of members: 41 

10 

[1] Medline: 95249577 Repressible cation-phosphate symporters in Neurospora crassa. 
Versaw WK, Metzenberg RL; Proc Natl Acad Sci U S A 1995;92:3884-3887. 

986. Photosynthetic reaction center proteins signature (photoRC) 
1 5 PROSITE cross-reference(s): PS00244; REACTIONCENTER 

In the photosynthetic reaction center of purple bacteria, two homologous integral 
membrane proteins, L(ight) and M(edium), are known to be essential to the light-mediated 
water-splitting process. In the photosystem II of eukaryotic chloroplasts two related 
2 0 proteins are involved: the Dl (psbA) and D2 proteins (psbD). These four types of protein 
probably evolved from a common ancestor [see 1,2 for recent reviews]. 

A signature pattern was developed which include two conserved histidine residues. In L 
and M chains, the first histidine is a ligand of the magnesium ion of the special pair 

2 5 bacteriochlorophyll, the second is a ligand of a ferrous non-heme iron atom. In photosystem 

II these two histidines are thought to play a similar role. 

Consensus pattem|NQH]-x(4)-P-x-H-x(2MSAG 
x-H-[SAG](2) 

3 0 [The first H is a magnesium ligand] [The second H is a iron ligand] 

Sequences known to belong to this class detected by the patternALL, except 
for broad bean psbA which has Gin instead of the second His. 
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[lJMichel H., Deisenhofer J. Biochemistry 27:1-7(1988). 
[2] Barber J. Trends Biochem. Sci. 12:321-326(1987). 

987. phytochrome: Phytochrome region 

5 This family contains a region specific to phytochrome proteins. Number of members: 
145 

988. PI3K_C2: C2 domain 

Phosphoinositide 3 -kinase region postulated to contain a C2 domain. Outlier of C2 family. 
1 0 Number of members: 39 

[1] Medline: 97388296 Using structure to define the function of phosphoinositide 3-kinase 
family members. Domin J, Waterfield MD; FEBS Lett 1997;410:91-95. 
[2] Medline: 97398940 Phosphoinositide 3-kinases: a conserved family of signal transducers. 
1 5 Vanhaesebroeck B, Leevers SJ, Panayotou G, Waterfield MD; Trends Biochem Sci 
1997;22:267-272. 

989. PI3Ka: Phosphoinositide 3-kinase family, accessory domain (PIK domain) 
PIK domain is conserved in all PI3 and PI4-kinases. Its role is unclear but it has been 

2 0 suggested [2] to be involved in substrate presentation. 
Number of members: 47 

[1] Medline: 97388296 Using structure to define the function of phosphoinositide 3-kinase 
family members. Domin J, Waterfield MD; FEBS Lett 1997;410:91-95. 

2 5 [2] Medline: 94069320 Phosphatidylinositol 4-kinase: gene structure and requirement for 

yeast cell viability. Flanagan CA, Schnieders EA, Emerick AW, Kunisawa R, Admon A, 
Thorner J; Science 1993;262:1444-1448. 

990. P-II protein signatures 

3 0 PROSITE cross-reference(s): PS00496; PII_GLNBJJMP, PS00638; PII GLNB CTER 

The P-II protein (gene glnB) is a bacterial protein important for the control of glutamine 
synthetase [1,2,3]. In nitrogen-limiting conditions, when the ratio of glutamine to 2- 



Reference No. 2750-942P 



817 

ketoglutarate decreases, P-II is uridylylated on a tyrosine residue to form P-II-UMP. P-II- 
UMP allows the deadenylation of glutamine synthetase (GS), thus activating the enzyme. 
Conversely, in nitrogen excess, P-II-UMP is deuridylated and then promotes the adenylation 
of GS. P-II also indirectly controls the transcription of the GS gene (glnA) by preventing NR- 
5 II(ntrB) to phosphorylate NR-I (ntrC) which is the transcriptional activator of glnA. 
Once P-II is uridylylated, these events are reversed. 

P-II is a protein of about 110 amino acid residues extremely well conserved. The tyrosine 
which is urydylated is located in the central part of the protein. 

10 

In cyanobacteria, P-II seems to be phosphorylated on a serine residue rather than being 
urydylated. 

In methanogenic archaebacteria, the nitrogenase iron protein gene (nifH) is followed by two 
15 open reading frames highly similar to the eubacterial P-II protein [4]. These proteins could 
be involved in the regulation of nitrogen fixation. 

In the red alga, Porphyra purpurea, there is a glnB homolog encoded in the chloroplast 
genome. 

20 

Other proteins highly similar to glnB are: 

- Bacillus subtilis protein nrgB [5]. 

- Escherichia coli hypothetical protein ybal [6]. 

25 

Two signature patterns were developed for P-II protein. The first one is a conserved 
stretch (in eubacteria) of six residues which contains the urydylated tyrosine, the other 
is derived from a conserved region in the C-terminal part of the P-II protein. 

3 0 Consensus pattern Y- [KR] -G- [AS] - [ AE] - Y [The second Y is uridylated] 

Sequences known to belong to this class detected by the patternALL glnB f s 
from eubacteria. 
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Consensus pattem[ST]-x(3>G-[D^ 

x(2)-fWM4 fLIVM SEC) ID NO:4i] 

[l]Magasanik B. Biochimie 71:1005-1012(1989). 

[2]Holtel A., Merrick M. Mol. Gen. Genet. 215:134-138(1988). 

[3]Cheah E., Carr P.D., Suffolk P.M., Vasuvedan S.G., Dixon N.E., Ollis D.L. Structure 
2:981-990(1994). 

[4]Sibold L., Henriquet M., Possot O., Aubert J.-P. Res. Microbiol. 142:5-12(1991). 
[5]Wray L.V. Jr., Atkinson M.R., Fisher S.H. J. Bacteriol. 176:108-1 14(1994). 
[6]Allikmets R., Gerrard B.C., Court D., Dean M.C. Gene 136:231-236(1993). 

991. PIP5K: Phosphatidylinositol-4-phosphate 5-Kinase 

This family contains a region from the common kinase core found in the type I 
phosphatidylinositol-4-phosphate 5-kinase (PIP5K) family as described in [1]. The family 
consists of various type I, II and III PIP5K enzymes. PIP5K catalyses the formation of 
phosphoinositol-4,5-bisphosphate via the phosphorylation of phosphatidylinositol-4- 
phosphate a precursor in the phosphinositide signaling pathway. Number of members: 33 

[1] Medline: 98204859. Type I phosphatidylinositol-4-phosphate 5-kinases. Cloning of the 
third isoform and deletion/substitution analysis of members of this novel lipid kinase family. 
Ishihara H, Shibasaki Y, Kizuki N, Wada T, Yazaki Y, Asano T, Oka Y; J Biol Chem 
1998;273:8741-8748. 

[2] Medline: 971 15834 Type I phosphatidylinositol-4-phosphate 5-kinases are distinct 
members of this novel lipid kinase family. Loijens JC, Anderson RA; J Biol Chem 1996 
20;271:32937-32943. 

992. PolyA_pol: Poly A polymerase family 

This family includes nucleic acid independent RNA polymerases, such as Poly(A) 
polymerase, which adds the poly (A) tail to mRNA EC:2.7.7.19. This family also includes the 
tRNA nucleotidyltransferase that adds the CCA to the 3' of the tRNA 
EC:2. 7.7.25. Number of members: 3 1 
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[1] Medline: 93066242 Identification of the gene for an Escherichia coli poly(A) polymerase. 
Cao GJ, Sarkar N; Proc Natl Acad Sci U S A 1992;89:10380-10384. 

993. Photosystem I psaA and psaB proteins signature (psaA_psaB) 
5 PROSITE cross-reference(s)PS0041 9; PHOTOS YSTEM_I__PSAAB 

Photosystem I (PSI) [1] is an integral membrane protein complex that uses light energy to 
mediate electron transfer from plastocyanin to ferredoxin. PSI is found in the chloroplast 
of plants and cyanobacteria. The electron transfer components of the reaction center of 
10 PSI are a primary electron donor P-700 (chlorophyll dimer) and five electron acceptors: AO 
(chlorophyll), Al (a phylloquinone) and three 4Fe-4S iron-sulfur centers: Fx, Fa, and Fb. 

PsaA and psaB, two closely related proteins, are involved in the binding of P700, AO, Al, 
and Fx. psaA and psaB are both integral membrane proteins of 730 to 750 amino acids that 
15 seem to contain 1 1 transmembrane segments. The Fx 4Fe-4S iron-sulfur center is bound by 
four cysteines; two of these cysteines are provided by the psaA protein and the two others 
by psaB. The two cysteines in both proteins are proximal and located in a loop between 
the ninth and tenth transmembrane segments. A leucine zipper motif seems to be present [2] 
downstream of the cysteines and could contribute to dimerization of psaA/psaB. 

20 

The signature pattern for these proteins is based on the perfectly conserved region that 
includes the two iron-sulfur binding cysteines. 

Consensus patternC-D-G-P-G-R-G-G-T-C [The two C's bind the iron-sulfur center] 

25 [l]Golbeck J.H. Biochim. Biophys. Acta 895:167-204(1987). 
[ 2]Webber A.N., Malkin R. FEBS Lett. 264:1-14(1990). 

994. PSBH: Photosystem II 10 kDa phosphoprotein 

This protein is phosphorylated in a light dependent reaction. 
30 Number of members: 20 



995. PsbJ 
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This family consists of the photosystem II reaction center protein PsbJ from plants and 
Cyanobacteria. In Synechocystis sp. PCC 6803 PsbJ regulates the number of photosystem II 
centers in thylakoid membranes, it is a predicted 4kDa protein with one membrane spanning 
domain [ 1 ] . Number of members: 20 

[1] Medline: 93131892. Genetic and immunological analyses of the cyanobacterium 
Synechocystis sp. PCC 6803 show that the protein encoded by the psbJ gene regulates the 
number of photosystem II centers in thylakoid membranes. Lind LK 5 Shukla VK, Nyhus KJ 5 
Pakrasi HB; J Biol Chem 1993;268:1575-1579. 

996. PSBT: Photosystem II reaction centre T protein 

The exact function of this protein is unknown. It probably consists of a single transmembrane 
spanning helix. The Swiss:P37256 protein, appears to be (i) a novel photosystem II subunit 
and (ii) required for maintaining optimal photosystem II activity under adverse growth 
conditions [1]. Number of members: 17 

[1] Medline: 94298765. The chloroplast ycf8 open reading frame encodes a 
photosystem II polypeptide which maintains photosynthetic activity under adverse growth 
conditions. Monod C ? Takahashi Y, Goldschmidt-Clermont M, Rochaix JD; EMBO J 
1994;13:2747-2754. 

997. PSI^8. PHOTOSYSTEM I REACTION CENTRE SUBUNIT VIII. Synonym(s)PSI-I. 
Gene name(s)PSAI. From Hordeum vulgare (Barley). Encoded on Chloroplast. Taxonomy 
Eukaryota; Viridiplantae; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; 
Liliopsida; Poales; Poaceae; Hordeum. 

MAY HELP IN THE ORGANIZATION OF THE PSAL SUBUNIT. BELONGS TO THE 
PSAI FAMILY. 

[1] SEQUENCE FROM N.A. MEDLINE; 90036933. Scheller H.V., Okkels J.S. 3 Hoej P.B., 
Svendsen I., Roepstorff P., Moeller B.L.; "The primary structure of a 4.0-kDa photosystem I 
polypeptide encoded by the chloroplast psal gene."; J. Biol. Chem. 264:18402-18406(1989). 

998. PSI_PsaJ: Photosystem I reaction centre subunit IX / PsaJ 
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This family consists of the photosystem I reaction centre subunit IX or PsaJ from various 
organisms including Synechocystis sp. (strain pec 6803) 5 Pinus thunbergii (green pine) and 
Zea mays (maize). PsaJ Swiss:P19443 is a small 4.4kDa ? chloroplastal encoded, hydrophobic 
subunit of the photosystem I reaction complex its function is not yet fully understood [1], 
5 PsaJ can be cross-linked to PsaF Swiss:P 12356 and has a single predicted transmembrane 
domain it has a proposed role in maintaing PsaF in the correct orientation to allow for fast 
electron transfer from soluble donor proteins to P700+ [1]. Number of members: 18 

[1] Medline: 99238330. A large fraction of PsaF is nonfunctional in photosystem I complexes 
10 lacking the PsaJ subunit. Fischer N, Boudreau E 5 Hippler M 5 Drepper F, Haehnel W 5 Rochaix 
JD; Biochemistry 1999;38:5546-5552. 

[2] Medline: 93252282. Genes encoding eleven subunits of photosystem I from the 
thermophilic cyanobacterium Synechococcus sp. Muhlenhoff U ? Haehnel W, Witt H, 
Herrmann RG; Gene 1993;127:71-78. 

15 

999. PSII. Protein namePHOTOSYSTEM II P680 CHLOROPHYLL A APOPROTEIN. 
Synonym(s)CP-47 PROTEIN. Gene name(s)PSBB. From Hordeum vulgare (Barley), 
Encoded on Chloroplast. Taxonomy Eukaryota; Viridiplantae; Embryophyta; Tracheophyta; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; Hordeum. 
2 0 FUNCTION: THIS PROTEIN CONJUGATES WITH CHLOROPHYLL & 

CATALYZES THE PRIMARY LIGHT-INDUCED PHOTOCHEMICAL PROCESSES OF 
PHOTOSYSTEM II. SUBCELLULAR LOCATION: CHLOROPLAST THYLAKOID 
MEMBRANE. SIMILARITY: BELONGS TO THE PSBB / PSBC FAMILY. 

2 5 [1] SEQUENCE FROM N.A. STRAIN=CV. SABARLIS; MEDLINE; 89240047. Andreeva 
A.V., Buryakova A.A., Reverdatto S.V., Chakhmakhcheva O.G., Efimov V.A.; "Nucleotide 
sequence of the 5.2 kbp barley chloroplast DNA fragment, containing psbB-psbH-petB-petD 
gene cluster."; Nucleic Acids Res. 17:2859-2860(1989). 

[2] SEQUENCE FROM N.A. STRAIN=CV. SABARLIS; MEDLINE; 92207253. Efimov 
30 V.A., Andreeva A.V., Reverdatto S.V., Chakhmakhcheva O.G.; "Photosystem II of rye. 

Nucleotide sequence of the psbB, psbC, psbE, psbF, psbH genes of rye and chloroplast DNA 
regions adjacent to them."; Bioorg. Khim. 17:1369-1385(1991). 
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[3] SEQUENCE OF 41 1-420. Hinz U.G.; "Isolation of the photosystem II reaction center 
complex from barley. Characterization by cicular dichroism spectroscopy and amino acid 
sequencing."; Carlsberg Res. Commun. 50:285-298(1985). 

5 1000. QRPTase. Quinolinate phosphoribosyl transferase. 

Quinolinate phosphoribosyl transferase (QPRTase) or nicotinate-nucleotide 
pyrophosphorylase EC:2.4.2.19 is involved in the de novo synthesis of NAD in both 
prokaryotes and eukaryotes. It catalyses the reaction of quinolinic acid with 5- 
phosphoribosyl-1 -pyrophosphate (PRPP) in the presence of Mg2+ to give rise to nicotinic 
1 0 acid mononucleotide (NaMN), pyrophosphate and carbon dioxide [1,2]. Number of members: 
26. 

[l]Medline: 97169443. A new function for a common fold: the crystal structure of quinolinic 
acid phosphoribosyltransferase. Eads JC, Ozturk D, Wexler TB, Grubmeyer C, Sacchettini 
15 JC; Structure 1997;5:47-58. 

[2]Medline: 96139309. The sequencing expression, purification, and steady-state kinetic 
analysis of quinolinate phosphoribosyl transferase from Escherichia coli. Bhatia R, Calvo 
KC; Arch Biochem Biophys 1996;325:270-278. 

20 1001. R3H domain 

The name of the R3H domain comes from the characteristic spacing of the most conserved 
arginine and histidine residues. The function of the domain is predicted to be binding 
ssDNA. Number of members: 28 

25 [l]Medline: 99003905 The R3H motif: a domain that binds single-stranded nucleic acids. 
GrishinNV; Trends Biochem Sci 1998;23:329-330. 

1002. recF protein signatures (RecF) 

30 The prokaryotic protein recF [1,2] is a single-stranded DNA-binding protein which also 

probably binds ATP. RecF is involved in DNA metabolism; it is required for recombinationai 
DNA repair and for induction of the SOS response. RecF is a protein of about 350 to 370 
amino acid residues; there is a conserved ATP-binding site motif 'A' (P-loop) in the N- 
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terminal section of the protein as well as two other conserved regions, one located in the 
central section, and the other in the C-terminal section. Signature patterns were derived from 
these two regions. 

Consensus pattern («^Mj[LW 

NO:2)'j -D Sequences known to belong to this class detected by the pattern ALL. 
Consensus pattem|W\W¥4^^ 

x(2)-[KRH]-x(3)-L Sequences known to belong to this class detected by the patternALL, 
except for T. palidum recF. 

[ 1] Sandler S.J., Chackerian B., Li J.T., Clark A.J. Nucleic Acids Res. 20:839-845(1992). 
[ 2] Alonso J.C., Fisher L.M.; Mol. Gen. Genet 246:680-686(1995). 

1003. RibD C-terminal domain (RibD_C) 

The function of this domain is not known, but it is thought to be involved in riboflavin 

biosynthesis. This domain is found in the C terminus of RibD/RibG Swiss:P25539, in 

combination with dCMP cyt deam, as well as in isolation in some archaebacterial proteins 

Swiss:P95872. 

Number of members: 21 

1004. Ribosomal protein LI 6 signatures (Ribosomal_L16) 

Ribosomal protein LI 6 is one of the proteins from the large ribosomal subunit. In Escherichia 
coli, LI 6 is known to bind directly the 23 S rRNA and to be located at the A site of the 
peptidyltransferase center. It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities [1], groups: 

- Eubacterial LI 6. 

- Algal and plant chloroplast LI 6. 

- Cyanelle LI 6. 

- Plant mitochondrial LI 6. 
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LI 6 is a protein of 133 to 185 amino-acid residues. As signature patterns, we 
selected two conserved regions in the central section of these proteins. 

Consensus pattern [KIt](2)-x-[GSAG}^ 

• 5 NO;760)l-[«¥ 

ffetVM^[LJ VM SEP ID NO:4Vj- [LFY]-[AP] Sequences known to belong to this class 
detected by the pattern ALL. 

Consensus patternR-M-G-x-rGRl-K-G-x(4VffWfemf.FWKR SE P ID NO: 761)] Sequences 
1 0 known to belong to this class detected by the patternALL. 

[ 1] Otaka E. 5 Hashimoto T., Mizuta K. 5 Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

1005. Ribosomal protein L32e signature (Ribosomal_L32E) 

15 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis 
of sequence similarities. One of these families consists of: 

- Mammalian L32 [1]. 
-Drosophila RP49 [2]. 

20 - Trichoderma harzianum L32 [3]. 

- Yeast L32e (YBL092w). 

- Archaebacterial L32e [4], 

These proteins have 135 to 240 amino-acid residues. As a signature pattern, a stretch of about 
20 residues located in the N-terminal part of these proteins was seleced. 

25 

Consensus pattemF-x-R-x(4HKR1-x(2H SEQ ID NO:2Yl -x(3.5)-W- 

R-[KR]-x(2)-G Sequences known to belong to this class detected by the pattern ALL. 

[ 1] Jacks CM., Powaser C.B., Hackett P.B. Gene 74:565-570(1988). 
30 [ 2] Aguade M. Mol. Biol. Evol. 5:433-441(1988). 

[ 3] Lora J.M., Garcia I. ? Benitez T. ? Llobell A., Pintor-Toro J.A. Nucleic Acids Res. 
21:3319-3319(1993). 
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[ 4] Arndt E. 5 Scholzen T., Kroemer W. 5 Hatakeyama T., Kimura M. Biochimie 73:657- 
668(1991). 

1006. (Ribosomal_S3) Ribosomal protein S3 signature 

PROSITE: PDOC00474. PROSITE cross-reference(s) PS00548; RIBOSOMAL_S3 

Ribosomal protein S3 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S3 is known to be involved in the binding of initiator Met-tRNA. It 
belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], 
groups: 

-Eubacterial S3. 

-Algal and plant chloroplast S3. 

-Cyanelle S3. 

-Archaebacterial S3. 

-Plant mitochondrial S3. 

-Vertebrate S3. 

-Insect S3. 

-Caenorhabditis elegans S3 (C23G10.3). 
-Yeast S3 (Rpl3). 

S3 is a protein of 209 to 559 amino-acid residues. A conserved region located in the C- 
terminal section was selected as a signature pattern. 

Consensus pattem|«ST^ 

m.NQ:l)l-xq)-fNQgCtfirNOSCH SEP ID NQ :5]9Y!-x(l JVlWFGAjlLtVF CA S EP ID 

Sequences known to belong to this class detected by the patternALL, except for some 
mitochondrial S3. 

[lJPtaka E. 5 Hashimoto T., Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

1007. RimM-RimM 

The RimM protein is essential for efficient processing of 16S rRNA [1]. The RimM protein 
was shown to have affinity for free ribosomal 30S subunits but not for 30S subunits in the 
70S ribosomes [1]. Number of members: 14. 
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[lJMedline: 98083058. RimM and RbfA are essential for efficient processing of 16S rRNA in 
Escherichia coli. Bylund GO, Wipemo LC, Lundberg LA, Wikstrom PM; J Bacteriol 
1998;180:73-82. 

5 

1008. RNAjpol_A - RNA polymerase alpha subunit 

-!- RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes 
contain a single RNA polymerase compared to three in eukaryotes (not including 
mitochondrial and chloroplast polymerases). 
10 -!- Members of this family include: A subunit from eukaryotes, gamma subunit from 
cyanobacteria, beta' subunit from eubacteria, A' subunit from archaebacteria, B n from 
chloroplasts. Number of members: 139. 

[lJMedline: 97066998. Structural modules of the large subunits of RNA polymerase. 
1 5 Introducing archaebacterial and chloroplast split sites in the beta and beta' subunits of 

Escherichia coli RNA polymerase. Severinov K ? Mustaev A, Kukarin A, Muzzin O, Bass I, 
Darst SA, Goldfarb A; J Biol Chem 1996;271:27969-27974. 

1009. RuBisCO_large - Ribulose bisphosphate carboxylase large chain active site 

2 0 PROSITE: PDOC00142; PROSITE cross-reference(s) PS001 57; RUBISCO LARGE 
Ribulose bisphosphate carboxylase (EC 4.1.1.39) (RuBisCO) [1,2] catalyzes the 
initial step in Calvin's reductive pentose phosphate cycle in plants as well as purple and green 
bacteria. It consists of a large catalytic unit and a small subunit of undetermined function. In 
plants, the large subunit is coded by the chloroplastic genome while the small subunit is 

2 5 encoded in the nuclear genome. Molecular activation of RuBisCO by C02 involves the 

formation of a carbamate with the epsilon-amino group of a conserved lysine residue. This 
carbamate is stabilized by a magnesium ion. One of the ligands of the magnesium ion is an 
aspartic acid residue close to the active site lysine [3]. A pattern was developed which 
includes both the active site residue and the metal ligand, and which is specific to RuBisCO 

30 large chains. 
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Consensus pattemG-x-[DN]-F-x-K-x-D-E [K is the active site residue] [The second D is a 
magnesium ligand]. Sequences known to belong to this class detected by the patternALL, 
except for Cheilopleuria biscuspis RuBisCO. 

5 [l]Miziorko H.M., Lorimer G.H. Annu. Rev. Biochem. 52:507-535(1983). 

[2]Akazawa T., Takabe T., Kobayashi H. Trends Biochem. Sci. 9:380-383(1984). 
[3]Andersson L, Knight S. ? Schneider G., Lindqvist Y., Lundqvist T., Branden C.-I., Lorimer 
G.H. Nature 337:229-234(1989). 

10 1010. Rve - Integrase core domain 

Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. 
Integrase is composed of three domains. The amino-terminal domain is a zinc binding 
domain Integrase_Zn. This domain is the central catalytic domain. The carboxyl terminal 
domain that is a non-specific DNA binding domain integrase. The catalytic domain acts as an 

15 endonuclease when two nucleotides are removed from the 3' ends of the blunt-ended viral 
DNA made by reverse transcription. This domain also catalyses the DNA strand transfer 
reaction of the 3' ends of the viral DNA to the 5' ends of the integration site [1]. Number of 
members: 694. 

20 [l]Medline: 95099322. Crystal structure of the catalytic domain of HIV-1 integrase: 
similarity to other polynucleotidyl transferases. Dyda F, Hickman AB, Jenkins TM, 
Engelman A, Craigie R, Davies DR; Science 1994;266:1981-1986. 

1011. (SBP_bac_3) Bacterial extracellular solute-binding proteins, family 3 signature 
25 PROSITE: PDOC00798. PROSITE cross-reference(s) PS01039; SBP_BACTERIAL_3 

Bacterial high affinity transport systems are involved in active transport of solutes 
across the cytoplasmic membrane. The protein components of these traffic systems include 
one or two transmembrane protein components, one or two membrane-associated ATP- 
binding proteins (ABC transporters; see <PDOC00185>) and a high affinity periplasmic 
3 0 solute-binding protein. The later are thought to bind the substrate in the vicinity of the inner 
membrane, and to transfer it to a complex of inner membrane proteins for concentration into 
the cytoplasm. 
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In gram-positive bacteria which are surrounded by a single membrane and have 
therefore no periplasmic region the equivalent proteins are bound to the membrane via an N- 
terminal lipid anchor. These homolog proteins do not play an integral role in the transport 
process per se, but probably serve as receptors to trigger or initiate translocation of the solute 
5 throught the membrane by binding to external sites of the integral membrane proteins of the 
efflux system. 

In addition at least some solute-binding proteins function in the initiation of sensory 
transduction pathways. 

On the basis of sequence similarities, the vast majority of these solute-binding 
1 0 proteins can be grouped [1] into eight families of clusters, which generally correlate with the 
nature of the solute bound. 

Family 3 groups together specific amino acids and opine-binding periplasmic proteins 
and a periplasmic homolog with catalytic activity: 

-Histidine-binding protein (gene hisJ) of Escherichia coli and related bacteria. An 

1 5 homologous lipoprotein exists in Neisseria gonorrhoeae. 

-Lysine/arginine/omithine-binding proteins (LAO) (gene argT) of Escherichia coli and 
related bacteria are involved in the same transport system than hisJ. Both solute-binding 
proteins interact with a common membrane-bound receptor hisP of the binding protein 
dependent transport system HisQMP. 

2 0 -Glutamine-binding proteins (gene glnH) of Escherichia coli and Bacillus 
stearothermophilus. 

-Glutamate-binding protein (gene gluB) of Corynebacterium glutamicum. 
-Arginine-binding proteins artl and artJ of Escherichia coli. 
-Nopaline-binding protein (gene nocT) from Agrobacterium tumefaciens. 

2 5 -Octopine-binding protein (gene occT) from Agrobacterium tumefaciens. 

-Major cell-binding factor (CBF1) (gene: pebl A) from Campylobacter jejuni. 
-Bacteroides nodosus protein aabA. 

-Cyclohexadienyl/arogenate dehydratase of Pseudomonas aeruginosa, a periplasmic 
enzyme which forms an alternative pathway for phenylalanine biosynthesis. 

3 0 -Escherichia coli protein fliY. 

-Vibrio harveyi protein patH. 

-Escherichia coli hypothetical protein ydhW. 

-Bacillus subtilis hypothetical protein yckB. 
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-Bacillus subtilis hypothetical protein yckK. 

The signature pattern is located near the N-terminus of the mature proteins. 
Consensus pattemG-[F¥IL] EY IL . SEQ. ro 
5 NQ:!j]-[DE]-p,l¥^ 

VAGC SEP ID NQ:762) j-x(2)-tf^MA^^L[ VMAGN SE C) I D N O:763 )1 
Sequences known to belong to this class detected by the patternALL. 

[l]Tam R., Saier M.H. Jr. Microbiol. Rev. 57:320-346(1993). 

10 

1012. Sec7 - Sec7 domain 

The Sec7 domain is a guanine-nucleotide-exchange-factor (GEF)for the arf family [2]. 
Number of members: 32. 

15 [l]Medline: 98169075. Structure of the Sec7 domain of the Arf exchange factor. ARNO. 
Cherfils J, Menetrey J, Mathieu M 9 Le Bras G ? Robineau S 5 Beraud-Dufour S, Antonny B ? 
Chardin P; Nature 1998;392:101-105. 

[2]Medline: 97100951. A human exchange factor for ARF contains Sec7- and pleckstrin- 
homology domains. Chardin P, Paris S, Antonny B, Robineau S 5 Beraud-Dufour S 5 Jackson 
2 0 CL, Chabre M. Nature 1996;384:481-484. 

1013. SecA_protein. Sec A protein, amino terminal region 

SecA protein binds to the plasma membrane where it interacts with proOmpA to support 
translocation of proOmpA through the membrane. SecA protein achieves this translocation, 
2 5 in association with Sec Y protein, in an ATP dependent manner. SecA possesses the ATPase 
activity. The carboxyl terminus has similarity with the helicase carboxyl terminus. See 
Ribosomal_L5. Number of members: 45. 

[lJMedline: 98309858. Ammo-terminal region of SecA is involved in the function of SecG 
30 for protein translocation into Escherichia coli membrane vesicles. Mori H, Sugiyama H 5 
Yamanaka M, Sato K, Tagaya M, Mizushima S; J Biochem (Tokyo) 1998;124:122-129. 
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[2]Medline: 89251629. SecA protein hydrolyzes ATP and is an essential component of the 
protein translocation ATPase of Escherichia coli. Lill R, Cunningham K 5 Brundage LA, Ito 
K, Oliver D, Wickner W; EMBO J 1989;8:961-966. 

1014. Seedstore_2S - 2S seed storage family 

Members of this family are composed of two chains (both included in the alignment), these 
are co-translated and later cleaved. The two chains are disulphide linked together. Number of 
members: 27. 

[l]Medline: 97121264. 1H NMR assignment and global fold of napin Bnlb, a representative 
2S albumin seed protein. Rico M, Bruix M, Gonzalez C, Monsalve RI, Rodriguez R; 
Biochemistry 1996;35:15672-15682. 

1015. Smr - Smr domain 

This family includes the Smr (Small MutS Related) proteins, and the C-terminal region of the 
MutS2 protein. It has been suggested that this domain interacts with the MutSl Swiss:P23909 
protein in the case of Smr proteins and with the N-terminal MutS related region of MutS2 
Swiss:P94545 [1]. Number of members: 14. 

[l]Medline: 10431 172. Smr: a bacterial and eukaryotic homologue of the C-terminal region 
of the MutS2 family. Moreira D, Philippe H; Trends Biochem Sci 1999;24:298-300. 

1016. (SSF) Sodium.solute symporter family signatures and profile 

PROSITE: PDOC00429. PROSITE cross-reference(s)PS00456; NA_SOLUT_SYMP_J 
PS00457; NA_SOLUT_SYMP_2 PS50283; NA_SOLUTE_SYMP_3 

It has been shown [1,2] that integral membrane proteins that mediate the intake of a 
wide variety of molecules with the concomitant uptake of sodium ions (sodium symporters) 
can be grouped, on the basis of sequence and functional similarities into a number of distinct 
families. One of these families is known as the sodium:solute symporter family (SSF) and 
currently consists of the following proteins: 
-Mammalian Na+/glucose co-transporter. 
-Mammalian Na+/myo-inositol co-transporter. 
-Mammalian Na+/nucleoside co-transporter. 
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-Mammalian Na+/neutral amino acid co-transporter. 
-Escherichia coli Na+/proline symporter (gene putP). 
-Escherichia coli Na+/pantothenate symporter (gene panF). 
-Escherichia coli hypothetical protein yidK. 
5 -Escherichia coli hypothetical protein yjcG. 

-Bacillus subtilis hypothetical protein ywcA (ipa-31R). 

These integral membrane proteins are predicted to comprise at least ten membrane 
spanning domains. Two conserved regions were selected as signature patterns; the first one is 
located in the fourth transmembrane region and the second one in a loop between two 
10 transmembrane regions in the C -terminal part of these proteins. 

Consensus pattem[GS]-x(2)-[LIY]-x(3H^ SEQ ID 

NO:7^(10)-[LIY]-[TAV]-x(2)-G-G-[LMF]-x-[SAP]. Sequences known to belong to this 
class detected by the patternALL. 
1 5 Consensus pattern^ A-ST}[G AST SEP ID NO: 179)|-(-MVMllLIV M SEQ IP NO:4)]-x(3)- 
[KR]>x(4)-G-A-x(2)4GAS]4fcP^QSi[LlVMGS SEP ID NO: 765 )]-{WMW][LiVMW 

SEP ID NO: 1 75 ? j Sequences known to belong to this class detected by the patternALL, 
except for E.coli yidK. 

2 0 Note this documentation entry is linked to both a signature pattern and a profile. As the 

profile is much more sensitive than the pattern, you should use it if you have access to the 
necessary software tools to do so. 

[l]Reizer J. 5 Reizer A., Saier M.H. Jr. Res. Microbiol. 141:1069-1072(1991). 
25 [2]Reizer J., Reizer A., Saier M.H. Jr. Biochim. Biophys. Acta 1 197:133-136(1994). 

1017. SurE - Survival protein SurE 

E. coli cells with the surE gene disrupted are found to survive poorly in stationary phase [1]. 
It is suggested that SurE may be involved in stress response. Yeast also contains a member of 
30 the family Swiss:P38254. Swiss:P30887 can complement a mutation in acid phosphatase, 
suggesting that members of this family could be phosphatases. Number of members: 17. 
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[l]Medline: 95014035. A new gene involved in stationary-phase survival located at 59 
minutes on the Escherichia coli chromosome. Li C, Ichikawa JK, Ravetto JJ, Kuo HC, Fu JC 5 
Clarke S; J Bacteriol 1994;176:6015-6022. 

[2]Medline: 93046805. Complementation of Saccharomyces cerevisiae acid phosphatase 
5 mutation by a genomic sequence from the yeast Yarrowia lipolytica identifies a new 
phosphatase. Treton BY, Le Dall MT, Gaillardin CM; Curr Genet 1992;22:345-355. 

1018. Synuclein - Synuclein 

There are three types of synucleins in humans, these are called alpha, beta and gamma. 
1 0 Alpha synuclein has been found mutated in families with autosomal dominant Parkinson f s 

disease. A peptide of alpha synuclein has also been found in amyloid plaques in Alzheimer's 
patients. Number of members: 12. 

[l]Medline: 98424410. The synuclein family. Lavedan C; Genome Res 1998;8:871-880. 

15 

1019. (T-box) T-box domain signatures 

PROSITE: PDOC00972. PROSITE cross-reference(s) PS01283; TBOX_l PS01264; 
TBOX_2 

A number of eukaryotic DNA-binding proteins contain a domain of about 170 to 190 
20 amino acids known as the T-box domain [1,2,3] and which probably binds DNA. The T-box 
has first been found in the mice T locus (Brachyury) protein, a transcription factor involved 
in mesoderm differentiation. It has since been found in the following proteins: 
-Vertebrate and invertebrate homologs of the T protein. 
-Mammalian proteins TBX1 to TBX6. 

2 5 -Mammalian protein TBR1 which is expressed specifically in brain. 

-Xenopus laevis eomesodermin (eomes). 

-Xenopus laevis Vegt (or Antipodean), a transcription factor that activates the expression of 
wnt-8, eomes and Brachyury. 
-Chicken TbxT. 

3 0 -Drosophila protein optomotor-blind (omb). 

-Drosophila protein brachyenteron (byn) (also known as Trg), which is 
required for the specification of the hindgut and anal pads. 
-Drosophila protein HI 5. 
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-Caenorhabditis elegans protein tbx-12. 

-Caenorhabditis elegans hypothetical proteins F21H1 1.3, F40H6.4, T07C4.2, T07C4.6 and 
ZK177.10. 

5 Two conserved regions were selected as signature patterns for the T-domain. The first region 
corresponds to the N-terminal of the domain and the second one to the central part. 
Consensus pattemL-W-x(2MFC]-x(3,4)-|OT^ 

Sequences known to belong to this class detected by the patternALL, except for C.elegans 
ZK177.10. 

1 0 Consensus pattemllJVMYW^JVMYW SEO I D NQ:?67)]-H-tf^tf^ |'PA.DH SEP ID 
NO:7^ F Sequences known to belong to 

this class detected by the patternALL, except for C.elegans tbx-12, ZK177.10 and Drosophila 
H15. 

15 [l]Bollag R.J., Siegfried Z. 5 Cebra-Thomas J.A., Garvey N. ? Davison E.M., Silver L.M. Nat. 
Genet. 7:383-389(1994). 

[2] Agulnik S.I., Garvey N., Hancock S. ? Ruvinsky I., Chapman D.L. ? Agulnik L, Bollag R.J., 
Papaioannou V.E., Silver L.M. Genetics 144:249-254(1996). 
[3]Papaioannou V.E. Trends Genet. 13:212-213(1997). 

20 

1 020. Toprim - Toprim domain 

This is a conserved region from DNA primase. This corresponds to the Toprim domain 
common to DnaG primases, topoisomerases, OLD family nucleases and RecR proteins [1]. 
Both DnaG motifs IV and V are present in the alignment, the DxD (V) motif may be involved 
2 5 in Mg2+ binding and mutations to the conserved glutamate (IV) completely abolish DnaG 
type primase activity [1]. DNA primase EC:2.7.7.6 is a nucleotidyltransferase it synthesizes 
the oligoribonucleotide primers required for DNA replication on the lagging strand of the 
replication fork; it can also prime the leading stand and has been implicated in cell division 
[2]. Number of members: 133. 

30 

[l]Medline: 98391745. Toprim--a conserved catalytic domain in type IA and II 
topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Aravind L, 
Leipe DD, Koonin EV; Nucleic Acids Res 1998;26:4205-4213. 
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[2]Medline: 97368180. Cloning and analysis of the dnaG gene encoding Pseudomonas putida 
DNA primase. Szafranski P, Smith CL, Cantor CR; Biochim Biophys Acta 1997;1352:243- 
248. 

[3]Medline: 94124015. The Haemophilus influenzae dnaG sequence and conserved bacterial 
5 primase motifs. Versalovic J, Lupski JR; Gene 1993;136:281-286. 

1021. TraB -TraB family 

pADl is a hemolysin/bacteriocin plasmid originally identified in Enterococcus faecalis DS16. 
It encodes a mating response to a peptide sex pheromone, cADl, secreted by recipient 
1 0 bacteria. Once the plasmid pADl is acquired, production of the pheromone ceases— a trait 
related in part to a determinant designated traB. However a related protein is found in C. 
elegans Swiss:Q94217, suggesting that members of the TraB family have some more general 
function. Number of members: 12. 

15 [l]Medline: 94302142. Characterization of the determinant (traB) encoding sex pheromone 
shutdown by the hemolysin/bacteriocin plasmid pADl in Enterococcus faecalis. An FY, 
Clewell DB; Plasmid 1994;31:215-221. 

1022. (Transpo_mutator) Transposases, Mutator family, signature 
2 0 PROSITE: PDOC00770. PROSITE cross-reference(s) PS01007; 

TRANSPOSASEMUTATOR 

Autonomous mobile genetic elements such as transposon or insertion sequences (IS) 
encode an enzyme, called transposase, required for excising and inserting the mobile element. 
On the basis of sequence similarities, transposases can be grouped into various families. One 
2 5 of these families has been shown [1,2 5 3,E1] to consist of transposases from the following 
elements: 

-Mutator from Maize. 
-Is 1201 from Lactobacillus helveticus. 
-Is905 from Lactococcus lactis. 
30 -Is 1081 from Mycobacterium bovis. 

-Is6120 from Mycobacterium smegmatis. 
-Is406 from Pseudomonas cepacia. 
-IsRm3 from Rhizobium meliloti. 
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-IsRm5 from Rhizobium meliloti. 
-Is256 from Staphylococcus aureus. 
-IsT2 from Thiobacillus ferrooxidans. 

The maize Mutator transposase (MudrA) is a protein of 823 amino acids; the bacterial 
5 transposases listed above are proteins of 300 to 420 amino acids. These proteins contain a 
conserved domain of about 130 residues; a signature pattern was derived from the most 
conserved part of this domain. 

Consensus pattemD-x(3VG-ft^MF|rLIVMF SEQ ID NO:2Vl-x(6HSja^STAV SEP ID 
1 0 NO:105YhfW^FYWj[LlVNfFYW SEC ID NO:26)]-[PT]-x- [STAVl [STA V SEP ID 

NQi.I05Ji-x(2)-[QR]-x-C-x(2)-H. Sequences known to belong to this class detected by the 
patternALL. 

[l]Eisen J.A., Benito M.-I., Walbot V. Nucleic Acids Res. 22:2634-2636(1994). 
15 [2]Guilhot C. ? Gicquel B., Davies J., Martin C. Mol. Microbiol. 6:107-1 13(1992). 
[3]Wood M.S., Byrne A., Lessie T.G. Gene 105:101-105(1991). 

1023. Transposase_8 - Transposase 

Transposase proteins are necessary for efficient DNA transposition. This family 
2 0 consists of various E. coli insertion elements and other bacterial transposases some of which 
are members of the IS3 family. Number of members: 58. 

[l]Medline: 97324595. Genetic organization and transposition properties of IS51 1. D. A. 
Mullin, D. L. Zies, A. H. Mullin, N. Caballera & B. Ely; Mol Gen Genet 1997;254:456-463. 

25 [2]Medline: 97128810. The use of an improved transposon mutagenesis system for DNA 

sequencing leads to the characterization of a new insertion sequence of Streptomyces lividans 
66. J. Fischer, H. Maier, P. Viell & J. Altenbuchner; Gene 1996;180:81-89. 
[3]Medline: 97074647. Identification and nucleotide sequence of Rhizobium meliloti 
insertion sequence ISRm6, a small transposable element that belongs to the IS3 family. S. 

30 Zekri & N. Toro; Gene 1996;175:43-48. 

1024. tRN A int endo - tRNA intron endonuclease 
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Members of this family cleave pre tRNA at the 5 f and 3' splice sites to release the intron 
EC:3.1.27.9. Number of members: 8. 

[l]Medline: 97344075. Properties of H. volcanii tRNA intron endonuclease reveal a 
5 relationship between the archaeal and eucaryal tRNA intron processing systems. Kleman- 
Leyer K, Armbruster DW, Daniels CJ; Cell 1997;89:839-847. 

1025. Urease - Urease signatures 

PROSITE: PDOC00133PROSITE cross-reference(s) PS01120; UREASE_1 PS00145; 
10 UREASES 

Urease (EC 3.5.1.5) is a nickel-binding enzyme that catalyzes the hydrolysis of urea 
to carbon dioxide and ammonia [1]. Historically, it was the first enzyme to be crystallized (in 
1926). It is mainly found in plant seeds, microorganisms and invertebrates. In plants, urease 
is a hexamer of identical chains. In bacteria [2], it consists of either two or three different 
1 5 subunits (alpha, beta and gamma). 

Urease binds two nickel ions per subunit; four histidine, an aspartate and a 
carbamated-lysine serve as ligands to these metals; an additional histidine is involved in the 
catalytic mechanism [3]. 

As signatures for this enzyme, a region that contains two histidine that bind one of the 
2 0 nickel ions and the region of the active site histidine was selected. 

Consensus pattern T-[AY]-[GA]-[GAT]4.WM}[UVM SEP ID NO:4)]-D-x-H- 
p,fVMi[LIVM SEP ID NO:4)]-H-x(3VP [The two H's bind nickel]. Sequences known to 
belong to this class detected by the patternALL. 
2 5 Consensus pattemfbP^Mj[ LIVM SEQ II ) N O:4)1(2V[CT]-H4HN]-L-xOVfWMjfLIVM 
SEP TP NO:4)] -x(2VD-[IJ VMj [I.I VM SEP ID NO:4Vj -x-F-A [H is the active site residue]. 
Sequences known to belong to this class detected by the patternALL. 

[l]TakishimaK., SugaT., Mamiya G. Eur. J. Biochem. 175:151-165(1988). 
30 [2]Mobley H.L.T., Husinger R.P. Microbiol. Rev. 53:85-108(1989). 

[3]Jabri E., Carr MB., Hausinger R.P., Karplus P.A. Science 268:998-1004(1995). 

1026. Urease beta - Urease beta subunit. 
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This subunit is known as alpha in Heliobacter. Number of members: 35. 

[l]Medline: 95273988. The crystal structure of urease from Klebsiella aerogenes. Jabri E, 
Carr MB, Hausinger RP, Karplus PA; Science 1995;268:998-1004. 

5 

1027. UvrD-helicase - UvrD/REP helicase 

The Rep family helicases are composed of four structural domains. The Rep family function 
as dimers. REP helicases catalyse ATP dependent unwinding of double stranded DNA to 
single stranded DNA. Swiss:P23478, Swiss:P08394 have large insertions near to the carboxy- 
1 0 terminus relative to other members of the family. Number of members: 52. 

[1] Medline: 97433075. Major domain swiveling revealed by the crystal structures of 
complexes of E. coli Rep helicase bound to single-stranded DNA and ADP. Korolev S, Hsieh 
J, Gauss GH, Lohman TM, Waksman G; Cell 1997;90:635-647. 

15 

1028. V-type ATPase 1 16kDa subunit family (V_ATPase_sub_a) 

This family consists of the 1 16kDa V-type ATPase (vacuolar (H+)-ATPases) subunits, as 
well as V-type ATP synthase subunit i. The V-type ATPases family are proton pumps that 

2 0 acidify intracellular compartments in eukaryotic cells for example yeast central vacuoles, 
clathrin-coated and synaptic vesicles. They have important roles in membrane trafficking 
processes [1]. The 1 16kDa subunit (subunit a) in the V-type ATPase is part of the V0 
functional domain responsible for proton transport. The a subunit is a transmembrane 
glycoprotein with multiple putative transmembrane helices t has a hydrophilic amino 

2 5 terminal and a hydrophobic carboxy terminal [1,2]. It has roles in proton transport and 

assembly of the V-type ATPase complex [1,2]. This subunit is encoded by two homologous 
gene in yeast VPH1 and STV1 [2]. 
Number of members: 27 

30 [1] Forgac M; Medline: 99240666 "Structure and properties of the vacuolar (H+)-ATPases." 
J Biol Chem 1999;274:12951-12954. 

[2] Forgac M; Medline: 99270697 "Structure and properties of the clathrin-coated vesicle and 
yeast vacuolar V-ATPases." J Bioenerg Biomembr 1999;31:57-65. 
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1029. Viral (Superfamily 1) RNA helicase (Viralhelicasel) 
Number of members: 260 

5 [1] Koonin EV, Dolja VV; Medline: 94094568 "Evolution and taxonomy of positive-strand 
RNA viruses: implications of comparative analysis of amino acid sequences." Crit Rev 
Biochem Mol Biol 1993;28:375-430. 

1030. Vesicular monoamine transporter (VMAT) 

10 

This family consists of various vesicular amine transporters with 12 transmembrane helices. 
These included vesicular acetylcholine transporters (VAChT) [3], and vesicular monoamine 
transporters (VMATs) [1,2] isoforms 1 adrenal and 2 brain (VMAT1 and VMAT2). 

1 5 These proteins transport biogenic amines into synaptic vesicles or chromaffin granules [4]. 
VMATs pack monoamine neurotransmitters into secretary vesicles for regulated exocytotic 
release, they also protect against the parkinsonian neurotoxins MPP+ by transporting it into 
vesicles preventing it from acting on mitochondria [1]. 

2 0 Also in the family is C. elegans UNC-17 a putative vesicular acetylcholine transporter 
mutations in UNC-17 cause impaired neuromuscular function, giving rise to jerky or 
uncoordinated movement, [4]. 
Number of members: 15 

2 5 [1] Krantz DE, Peter D, Liu Y, Edwards RH; Medline: 97197857 "Phosphorylation of a 
vesicular monoamine transporter by casein kinase II." J Biol Chem 1997;272:6752-6759. 
[2] Erickson JD, Varoqui H, Schafer MK, Modi W, Diebler MF, Weihe E, Rand J, Eiden LE, 
Bonner TI, Usdin TB; Medline: 94350930 "Functional identification of a vesicular 
acetylcholine transporter and its expression from a 'cholinergic 5 gene locus." J Biol Chem 

30 1994;269:21929-21932. 

[3] Erickson JD, Schafer MK, Bonner TI, Eiden LE, Weihe E; Medline: 96209876 "Distinct 
pharmacological properties and distribution in neurons and endocrine cells of two isoforms of 
the human vesicular monoamine transporter." Proc Natl Acad Sci U S A 1996;93:5166-5171. 
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[4] Alfonso A, Grundahl K, Duerr JS, Han HP, Rand JB; Medline: 3342494 "The 
Caenorhabditis elegans unc-17 gene: a putative vesicular acetylcholine transporter." Science 
1993;261:617-619. 

1031. WW/rsp5/WWP domain signature and profile. Cross-reference(s): PS01 159; 
WW_DOMAIN_l; PS50020; WW DOMAIN 2 

The WW domain [1-4JE1] (also known as rsp5 or WWP) has been originally discovered as a 
short conserved region in a number of unrelated proteins, among them dystrophin, the gene 
responsible for Duchenne muscular dystrophy. The domain, which spans about 35 residues, 
is repeated up to 4 times in some proteins. It has been shown [5] to bind proteins with 
particular proline-motifs, [AP]-P-P-[AP]-Y, and thus resembles somewhat SH3 domains. It 
appears to contain beta-strands grouped around four conserved aromatic positions; generally 
Trp. The name WW or WWP derives from the presence of these Trp as well as that of a 
conserved Pro. It is frequently associated with other domains typical for proteins in signal 
transduction processes. 

Proteins containing the WW domain are listed below. 

-Dystrophin, a multidomain cytoskeletal protein. Its longest alternatively spliced form 
consists of an N-terminal actin-binding domain, followed by 24 spectrin-like repeats, a 
cysteine-rich calcium-binding domain and a C-terminal globular domain. Dystrophin form 
tetramers and is thought to have multiple functions including involvement in membrane 
stability, transduction of contractile forces to the extracellular environment and organization 
of membrane specialization. Mutations in the dystrophin gene lead to muscular dystrophy of 
Duchenne or Becker type. Dystrophin contains one WW domain C-terminal of the spectrin- 
repeats. 

— Utrophin, a dystrophin-like protein of unknown function. 

-Vertebrate YAP protein is a substrate of an unknown serine kinase. It binds to the SH3 
domain of the Yes oncoprotein via a proline-rich region. This protein appears in alternatively 
spliced isoforms, containing either one or two WW domains [6], 

-Mouse NEDD-4 plays a role in the embryonic development and differentiation of the 
central nervous system. It contains 3 WW modules followed by a HECT domain. The 
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human ortholog contains 4 WW domains, but the third WW domain is probably spliced 
resulting in an alternate NEDD-4 protein with only 3 WW modules [3]. 
—Yeast RSP5 is similar to NEDD-4 in its molecular organization. It contains an N-terminai 
C2 domain (see <PDOC00380>), followed by a histidine-rich region, 3 WW domains and a 
5 HECT domain. 

—Rat FE65, a transcription-factor activator expressed preferentially in liver. The activator 
domain is located within the N-terminal 232 residues of FE65, which also contain the WW 
domain. 

—Yeast ESS1/PTF1, a putative peptidyl prolyl cis-trans isomerase from family ppiC (see 
1 0 <PDOC00840>). A related protein, dodo (gene dod) exists in Drosophila and in mammals 
(gene PIN1). 

-Tobacco DB10 protein. The WW domain is located N-terminal to the region with 
similarity to ATP-dependent RNA helicases. 

-IQGAP, a human GTPase activating protein acting on ras. It contains an N-terminal 
1 5 domain similar to fly muscle mp20 protein and a C-terminal ras GTPase activator domain. 

-Yeast pre-mRNA processing protein PRP40, Caenorhabditis elegans ZK1 098.1 and fission 
yeast SpAC13C5.02 are related proteins with similarity to MY02-type myosin, each 
containing two WW-domains at the N-terminus. 

—Caenorhabditis elegans hypothetical protein C38D4.5, which contains one WW module, a 
2 0 PH domain (see <PDOC50003>) and a C-terminal phosphatidylinositol 3-kinase domain. 
—Yeast hypothetical protein YFLOlOc. 

For the sensitive detection of WW domains, a profile was developed which spans the whole 
homology region as well as a pattern. 

25 

Description of pattern(s) and/or profile(s): 

Consensus pattemW-x(9,l l>rVFYl-rFYWyx(6 JHGSTNE]{GSTNE SEP ID NO:737)j- 
fGSTOCR][G S TOCR SEP IP NO:738)l-[FYW]-x(2)-P. 

30 

[ 1] Bork P., Sudol M. Trends Biochem. Sci. 19:531-533(1994). 

[ 2] Andre B. 5 Springael J.Y. Biochem. Biophys. Res. Commun. 205:1201-1205(1994). 
[ 3] Hofmann K.O., Bucher P. FEBS Lett. 358:153-157(1995). 
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[ 4] Sudol M., Chen H.I., Bougeret C, Einbond A., Bork P. FEBS Lett. 369:67-71(1995). 
[ 5] Chen H.I., Sudol M. Proc. Natl. Acad. Sci. U.S.A. 92:7819-7823(1995). 
[ 6] Sudol M., Bork P., Einbond A., Kastury K., Druck T., Negrini M., Huebner 
K., Lehman D. J. Biol. Chem. 270:14733-14741(1995). 

5 

1032. XPA protein signatures, cross-reference(s): XPA_1 PROSITE PS00752; 
PS00753;XPA_2. 

Xeroderma pigmentosum (XP) [1] is a human autosomal recessive disease, 
characterized by a high incidence of sunlight-induced skin cancer. People's 
1 0 skin cells with this condition are hypersensitive to ultraviolet light, due 

to defects in the incision step of DNA excision repair. There are a minimum of 
seven genetic complementation groups involved in this pathway: XP-A to XP-G. 
XP-A is the most severe form of the disease and is due to defects in a 30 Kd 
nuclear protein called XPA (or XPAC) [2]. 

15 

The sequence of the XPA protein is conserved from higher eukaryotes [3] to 
yeast (gene RAD 14) [4]. XPA is a hydrophilic protein of 247 to 296 amino-acid 
residues which has a C4-type zinc finger motif in its central section. 

2 0 Two signature were developed patterns for XPA proteins. The first corresponds to the 

zinc finger region, the second to a highly conserved region located some 12 residues after the 
zinc finger region. 

Consensus pattemC-x-[DE]-C<3)-ffeIVM^ 
25 F-x(4)-C-x(2)-C 

Consensus patternjT^Mf jl JVM SEP ID NO:4W 2VT-rKR1-T-E-x-K-x-rDE1-Y- 

{4.4¥MI4 rLlVMF SE P ID NO:2Vl(2>x-D-x-rDE1 

[ 1] Tanaka K., Wood R.D. Trends Biochem. Sci. 19:83-86(1994). 
30 [2] Miura N., Miyamoto I., Asahina H., Satokata I., Tanaka K., Pkada Y. J. Biol. Chem. 
266:19786-19789(1991). 

[ 3] Shimamoto T., Kohno K., Tanaka K., Pkada Y. Biochem. Biophys. Res. Commun. 
181:1231-1237(1991). 
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[ 4] Bankmann M., Prakash L. 5 Prakash S. Nature 355:555-558(1992). 

1033. YCF9 

This family consists of the hypothetical protein product of the YCF9 gene from 
chloroplasts and cyanobacteria. Number of members: 16 

1034. (DUF15) 

It is highly conserved between eubacteria and eukaryotes. 
Number of members: 30 

1035. Lumenal portion of Cytochrome b559, alpha (gene psbE) subunit. (cytochr_b559a) 

This family is the lumenal portion of cytochrome b559 alpha chain, matches to this family 
should be accompanied by a match to the cytochr_b559 family also. The Prosite pattern 
pattern matches the transmembrane region of the cytochrome b559 alpha and beta subunits. 
Number of members: 16 



A. Asparaginase 2 

Asparaginase II (L-asparagine aminohydrolase II) is an extracellular protein that may be 
associated with the cell wall and whose expression is affected by the availability of nitrogen. 
Asparaginase II catalyzes the reaction of L-Asparagine + H 2 0 = L- Aspartate + NH 3 . As 
many leukemias have high requirements for aspartic acid, asparaginase II proteins are useful 
as reagents for screening compounds for activity as leukemia chemotherapy products. 
Asparaginase II protein can also be over- or under-expressed to alter amino acid content in 
plant tissues or to modify nitrogen fixation and/or nitrogen metabolism in plants. 



Ref: Bon et al. (1997) Appl Biochem Biotechnol 63-65: 203-12 
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B. Chloroa b-bind 

Chlorophyll a-b binding proteins are located in the thylakoid membranes of the chloroplast 
and bind chlorophyll a and chlorophyll b, thereby triggering a chemical reaction 
(photosynthesis). These proteins are useful in controlling the rate, efficiency and/or output of 
photosynthesis. Overexpression of chlorophyll a-b binding proteins is expected to increase 
the rate of photosynthesis. 

Ref: Leutwiler et al. (1986) Nucleic Acids Res 14: 405 1-64 
Brandt et ai. (1992) Plant Mol Biol 19: 699-703 

C. DMRL synthase 

DMRL Synthase (6,7-Dimethyl-8-Ribityllumazine Synthase) catalyzes the last step in 
riboflavin (Vitamin B 2 ) synthesis, condensing 5-amino-6-(r-D)-ribityl-amino-2 ? 4(lH 5 3H)- 
Pyrimidinedione with L-3,4-Dihydroxy-2-Butanone 4-Phosphate producing 6,7-Dimethyl-8- 
(l-D-Ribityl)Luminazine . The enzyme forms a homopentamer. Engineering of these 
proteins or those with homologous sequences/structures may allow control of the amounts of 
vitamin B 2 available in plants and/or accumulation of pigment, as well as altering reactions 
requiring hydrogen ion carriers/transmitters. 

Ref: Garcia-Ramirez et al. (1995) J Biol Chem 270: 23801-7 

D. El N 

These proteins are ATP-dependent DNA helicases that are required for initiation of viral 
DNA replication. They form a complex with the viral E2 protein. The E1-E2 complex binds 
to the replication origin that contains binding sites for both proteins. The majority of 
sequences known for this group of proteins are from various papillomaviruses, a type of 
double stranded DNA virus. In plants, the prototype double stranded DNA virus is 
Cauliflower Mosaic virus (CaMV). Manipulation of these proteins, especially to produce 
variant proteins that form non-productive complexes, enables production of plants that are 
resistant to infection by double stranded DNA viruses. 
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Ref: Yang et al. (1993) PNAS USA 90: 5086-90 

Ustav and Stenlund (1991) EMBO J 10: 449-57 
Callaway et al. (1996) Mol Plant Microbe Interact 9: 810-8 

5 

E. EF1 G 

Elongation Factor- 1 is composed of four subunits: alpha, beta, delta and gamma. Gamma 
subunits are presumed to play a role in anchoring the complex to other cellular components. 
1 0 Studies of EF-1 genes in plants suggests that different forms of the EF-1 subunits may be 

expressed in particular organs or in response to stress. Manipulation of the activity of these 
proteins, either by altered expression level or by structural mutation, may result in the 
accumulation of a particular protein in a chosen organ or allow production of particular 
proteins during stress conditions. 

15 

Ref: Kinzy et al. (1994) NAR 22: 2703-7 

Dunn et al. (1993) Plant Mol Biol 23: 221-5 
Aguilar et al. (1991) Plant Mol Biol 17: 351-60 

2 0 F. ENV polyprotein 

This family comprises the envelope or coat proteins known from a number of different 
retroviruses. In mammalian species, retroviruses are responsible for diseases such as 
leukemia and HIV. In plants, retroviruses are known in both monocot (e.g. Zeon-1) and dicot 

2 5 (e.g. Arabidopsis and tobacco) species and have been shown to induce mutant alleles at new 

loci. Engineering of plant ENV proteins may allow mobilization or targeting of endogenous 
or introduced retroviruses, in essence generating a new method for mutant production, gene 
tagging and the like. 

3 0 Ref: Mamoun et al (1990) J Virol 64: 41 80-8 

Grandbastien et al. (1989) Nature 337: 376-80 
Wright and Voytas (1998) Genetics 149: 703-1 5 
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G. Glycosvl hvdr9 

Proteins having this domain (previously known as the glycosyl hydrolase family 5 domain) 
5 catalyze the endohydrolysis of 1,4-p-D-glucosidic linkages in cellulose. Numerous plant 
proteins with this domain exist and are expressed in an organ specific manner. They are 
involved in the fruit ripening process, in cell elongation and plant reproduction. Modulation 
of the activity of these proteins, either by over- or under-expression or by mutation of the 
polypeptide, could be used to affect post-harvest physiology (e.g. rate of ripening) or for 
1 0 engineering reproductive sterility. 

Ref: Giorda et al. (1990) Biochemistry 29: 7264-9 
Tucker et al. (1988) Plant Physiol 88: 1257-62 
Shani et al. (1997) 43: 837-42 
15 Milligan and Gasser (1995) Plant Mol Biol 28: 691-71 1 

H . Glycosvl hvdr 1 4 

The p-amylases (family 14 of glycosyl hydrolases) catalyze the hydrolysis of 1,4-a- 
2 0 glucosidic linkages in polysaccharides and remove successive maltose units from the non- 
reducing ends of the chains. Mutants of p-amylase in Arabidopsis exhibited altered 
degradation of starch throughout the diurnal cycle. In addition, the mutant phenotypes 
indicated that these enzymes not only affect carbohydrate metabolism/catabolism, but also 
influence the amount of pigment stored within particular cells. Manipulation of the p-amylase 
2 5 genes enables control of plant pigmentation (for example, fibre pigment in cotton) as well as 
carbohydrate synthesis and degradation. 

Ref: Zeeman et al. (1998) Plant J 15: 357-65 

Hirano and Nakamura (1997) Plant Physiol 1 14: 5675-82 
30 Kitamoto et al. (1988) J Bacteriol 170: 5848-54 
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I. Glvcosvl hvdr!5 

Glycosyl hydrolases from family 15 (such as 1 ,4-Alpha-D-Glucan glucohydrolase,) catalyze 
the hydrolysis of terminal 1,4-linked alpha-D-glucose residues successively from the non- 
reducing ends of the chains resulting in the release of p-D-Glucose. In plants these proteins 
have been tied to the mobilization of the xyloglucan stored in the cotyledonary cell walls. 
Proteins such as these could be varied to affect the rate of plant growth (for example during 
germination), storage and/or use of glucose and other sugars by plant tissues and alteration of 
the properties, such as elasticity, of plant cell walls. 

Ref: Crombie et al. (1998) Plant J 15: 27-38 

Hata et al. (1991) Agric Biol Chem 55: 941-9 

J. Glvcosvl hvdr20 

Members of the family 20 glycosyl hydrolases catalyze the hydrolysis of terminal non- 
reducing N-acetly-D-hexosamine residues in N-acetyl-p-D-hexosaminides. N-acetyl-p - 
glucosaminidase belongs to this family and exists in several different forms (consisting of 
various combinations of alpha and beta chains) depending on the organism. Family 20 
glycosyl hydrolases have been implicated in lysosomal storage diseases (such as Sandhoff 
disease) and glycogen storage disease in humans. These types of proteins are also 
responsible for the hydrolysis of chitin. In plants, these proteins could be useful in 
controlling carbohydrate catabolism, thereby influencing the amount of sugars available for 
storage and/or use in other metabolic pathways. In addition, it is possible that such proteins 
could be used to engineer an endogenous insect protection mechanism, e.g. by secretion of a 
chitin-hydrolyzing composition by the plant. 

Ref: Graham et al (1988) J Biol Chem 263: 16823-9 
O'Dowd et al. (1988) Biochemistry 27: 5216-26 



K. HMG box 
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The HMG box is a novel type of DN A-binding domain found in a diverse group of proteins. 
Numerous plant proteins contain this domain, such as the HMGl/2-like proteins. The 
expression of some of these HMG proteins appears to be regulated by circadian rhythms and 
in a light dependent manner, occurring at higher levels in roots, for example and lower levels 
in light-grown tissues such as cotyledons. Generally, HMG proteins are thought to influence 
transcription regulation. In plants, HMGs are believed to have a role in maintaining patterns 
of circadian-regulated expression for other genes, suggesting that these proteins could be 
exploited to control growth and development. 

Ref: Laudet et al. (1993) Nucleic Acids Res 21 : 2493-501 
Zheng et al. (1993) Plant Mol Biol 23: 813-23 
Grasser et al. (1993) Plant Mol Biol 23: 619-25 

L. IL2 

Interleukin-2 (IL-2)is produced in mammals by T cells in response to antigenic or mitogenic 
stimulation and is crucial for proper regulation and functioning of the immune response. IL-2 
is capable of stimulating B cells, monocytes, lymphokine-activated killer cells, natural killer 
cells and glioma cells. Plant extracts have also been shown to stimulate the immune system 
(for example, mistletoe therapy for human cancer). It is known that IL-2 is involved in 
feedback inhibition pathways that impact the inflammatory response as well as the growth 
inhibition of tumor reactive T cells. Plant proteins containing IL-2-like sequences are useful 
as immunity-based therapeutics, acting in a manner similar to IL-2 in mammals. 

Ref: Heike et al. (1997) Scand J Immunol 45: 221-6 
Ariel et al. (1998) J Immunol 161: 2465-72 
Schink (1997) Anticancer Drugs 8 Suppl 1: S47-51 

M. Oxidored FMN 

NADPH dehydrogenases catalyze the reaction NADPH + acceptor = NADP(+) + reduced 
acceptor. One member of this family is yeast "old yellow enzyme" (OYE) and is thought to 
be involved in oxylipin metabolism. A second yeast family member is a protein that binds 
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estrogen binding protein (EBP) in addition to exhibiting oxidoreductase activity. An 
Arabidopsis homolog to OYE has been described and estrogen binding proteins in plants 
have been reported. Plant proteins from this class have the potential to be used to modify 
lipid metabolism/catabolism. These proteins may also have use as therapeutics for breast and 
prostate cancer, and other abnormal growth in steroid-sensitive tissues. 

Ref: Baker et al. (1998) Proc Soc Exp Biol Med 217: 317-21 
Schaller and Weiler (1997) J Biol Chem 272: 28066-72 
Mandani et al. (1994) PNAS USA 91 : 922-6 

N. Oxidored q 2 

The NADH-plastoquinone oxidoreductases catalyze the reaction NADH + plastoquinone = 
NAD(+) + plastoquinol. In plants these reactions occur in the chloroplast and are believed to 
participate in a chloroplast respiratory system. Here, the NDH complex is postulated to act as 
a valve to remove excess reduction equivalents in the chloroplasts. Manipulation of these 
proteins may improve the rate or efficiency of photosynthesis. 

Ref: Burrows et al. (1998) EMBO J 17: 868-76 

Kofer et al (1998) Mol Gen Genet 258: 166-73 
Maier et al. (1995) J Mol Biol 251: 614-28 

O. PABP 

Polyadenylate binding proteins bind the poly (A) tail of mRNA. Plants, as exemplified by 
Arabidopsis, contain numerous PABP genes that are expressed in an organ-specific manner. 
For example, PABP2 is functional in roots and shoots, while PABP5 is expressed 
predominantly in immature flowers. The PABP proteins are implicated in numerous aspects 
of posttranscriptional regulation including mRNA turnover and translational initiation. 
Control of activity of PABP proteins provides the ability to control the expression of various 
genes in particular organs during development. 

Ref: Hilson et al (1993) Plant Physiol 103: 525-33 
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Belostotsky and Meagher (1993) PNAS USA 90: 6686-90 
P. Parvo coat 

Parvoviruses are linear single-stranded DNA viruses that are encapsulated by three capsid 
proteins. Plants are susceptible to infection by single stranded DNA viruses such as Maize 
streak virus (MSV) and various Gemini viruses. The coat proteins in these plant viruses are 
critical to the virus life cycle within the plant. For example, the coat protein of MSV is 
thought to be involved in intra- and inter-cellular movement within the plant. Engineering of 
proteins having similarity to parvoviral coat proteins, especially to produce proteins that 
interfere with maturation of the virus particle, enables the production of plants having better 
resistance to natural plant single-stranded DNA viruses. 

Ref: Liu et al. (1997) J Gen Virol 78: 1265-70 
Rohde et al. (1990) Virology 176: 648-51 

O. Pkinase C 

Plant serine/threonine protein kinases possessing this domain are expressed in all tissues and 
are known to undergo serine-specific autophosphorylation and specifically phosphorylate two 
ribosomal proteins, PI 4 and PI 6. During development, these proteins predominate during 
high metabolic activity in growing buds, root tips, leaf margins and germinating seeds. They 
are thought to be involved in the control of plant growth and development. In addition, two 
genes encoding proteins from this family have been described that help plant cells adapt 
during cold or high salt stresses. Consequently, engineering Pkinase C proteins provides a 
way to control general growth/development of the plant as well as a means to provide 
endogenous protection against environmental stresses. 

Ref: Zhang et al. (1994) J Biol Chem 269: 17586-92 

Mizoguchi et al. (1995) FEBS Lett 358: 199-204 



R. REV 
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The REV proteins act post-transcriptionally to relieve negative repression of GAG and ENV 
production in retroviruses such as Human Immounodeficiency Virus type I (HIV-1). Plants 
contain retrovirus-like viruses such as pararetroviruses and retrotransposons (i.e. transposons 
having long terminal repeats). Plant retrotransposons in particular have been used to create 
mutations at various loci, thereby permitting gene isolation, gene tagging and the like. 
Manipulation of plant REV proteins enables control of transposition frequencies of 
corresponding transposable elements and provides a new tool for genetic engineering of 
plants. 

Ref: Sodroski et al. (1986) Nature 321 : 412-7 

Franchini et al. (1989) PNAS USA 86: 2433-7 
Marquet et al. (1995) 77: 1 13-24 
Grandbastien et al. (1989) Nature 337: 376-80 
Wright and Voytas (1998) Genetics 149: 703-15 

S. RuBisCo small 

Ribulose 1,5-bisphosphate carboxylase/oxygenase (RuBisCo) catalyzes the initial step in the 
C3 photosynthetic carbon reduction cycle, adding carbon dioxide to D-ribulose 1,5- 
bisphosphate to form two molecules of 3-phospho-D-glycerate. RuBisCo is comprised of 
two subunits, one large which is synthesized in the chloroplast, and one small which is 
synthesized in the cytoplasm and then transported in to the chloroplast. The expression of the 
small subunit of RuBisCo is light regulated. Manipulation of these proteins could increase 
the efficiency of photosynthesis or allow alterations in developmental timing. 

Ref: Giuliano et al. (1988) PNAS USA 85: 7089-93 
Dedonder et al. (1993) Plant Physiol 101: 801-8 

T. Sialvltransf 

Members of the CMP-N-acetylneuraminate-P-galactosamide-a-2 ? 3-sialyltransferase family 
catalyze the following reaction: 
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CMP-N-acetylneuraminate + p-D-galactosyI-l,3-N-acetyl-a-D-galactosaminyl-R = CMP + 
a-N-acetylneraminyl-2,3-p-D-galactosy These 
proteins are though to be responsible for the synthesis of the sequence neurac-cc-2,3-gal-p- 
1,3-galnac- found on sugar chains )-linked to threonine or serine and also as a terminal 
sequence on certain gangliosides in mammalian cells. In plants, glycosyltransferases in the 
Golgi apparatus synthesize cell wall polysaccharides and elaborate the complex glycans of 
glycoproteins. Engineering of plant sialyltransferases allows targeting of proteins to 
particular cellular locations or enables the making of changes in cell wall structure. 

Ref: Wee et al. (1998) Plant Cell 10: 1759-68 

Lee et al. (1994) J Biol Chem 269: 10028-33 

Kitagawa and Paulson (1994) J Biol Chem 269: 1394-401 

U. Signal 

Many plant proteins in this family contain sequences similar to those found in both 
components of the prokaryotic family of signal transducers known as the two-component 
systems. This suggests that activation may require a transfer of a phosphate group between 
the transmitter domain and the receiver domain. One family member in Arabidopsis appears 
to be involved in ethylene (a plant hormone) signal transduction. Other proteins in this family 
appear to be involved in the regulation of gene transcription under conditions of 
environmental stress. Signal proteins can be exploited to affect plant growth and development 
and/or control plant responses to stress conditions such as cold, nutrient availability, etc. 

Ref: Chang et al. (1993) Science 262: 539-44 
Nagaya et al. (1 993) Gene 1 3 1 : 1 1 9-1 24 
Gottfert et al. (1990) PNAS USA 87: 2680-4 

V. vMSA 

vMSA proteins are major surface antigens presenting on the envelope of various 
retroviruses. Surface antigens of retroviruses are often involved in tropism of the virus. 
Plants contain retrovirus-like viruses such as pararetroviruses and retrotransposons (i.e. 
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transposons having long terminal repeats). Plant retrotransposons in particular have been 
used to create mutants at various loci, thereby permitting gene isolation, gene tagging and the 
like. Manipulation of plant vMSA proteins enables control of tropism of plant retroviruses 
that might be used for genetic engineering tools, thus enabling targeting of the virus to 
particular species and/or tissues of plants. 

Ref: Okamoto et al. (1988) J Gen Virol 69: 2575-83 
Grandbastien et al. (1989) Nature 337: 376-80 
Wright and Voytas (1998) Genetics 149: 703-15 

W. zf-CCCH 

This family of proteins is defined by having two CX(8)CX(5)CX(3)H-type zinc finger 
domains. These proteins cover a broad range of functions. For example, the COP1 protein 
acts as a repressor of photomorphogenesis in darkness; light stimuli abolish this suppressive 
action. In addition, COP1 protein can function as a negative transcriptional regulator capable 
of direct interaction with components of the G-protein signaling pathway. As a second 
example, a zf-CCCH protein identified in Arabidopsis appears to be involved in the 
resistance to DNA damage induced by UV light and chemical DNA-damaging agents. 
Overexpression of this class of proteins permits production of plants that are better suited to 
adverse environments. Manipulation of expression of zf-CCCH proteins functioning as 
transcriptional regulators, such as COP1, enables manipulation of some signal transduction 
pathways. 

Ref: Pang et al. (1993) Nucleic Acids Res 21 : 1647-53 
Deng et al. (1992) Cell 71 : 791-801 

X. zf-RanBP 

Proteins falling within this category contain many X-X-F-G and X-F-X-F-G repeats, and may 
contain RANBPl-like or PPIase domains. Plant proteins having domains similar to these 
include PAS1 and GMSTI. PAS1 has been shown to have dramatic developmental affects 
that appear to be correlated with both cell division and cell wall elongation. GMSTI has high 
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identity to the yeast STI stress-inducible gene and has been shown to be heat inducible. 
Proteins such as these may be useful for controlling growth and form of development. 

Ref: Vittorioso et al. (1998) Mol Cell Biol 18: 3034-43 
5 Hernandez Torres et al. (1995) 27: 1221-6 

Y. Peptidase M48. 

Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor 
1 0 and are located in the membranes of the endoplasmic reticulum. They function in NH2- 
terminal proteolytic processing, as shown for the yeast STE24 gene product. This gene is 
required for the correct processing of a-factor, a yeast pheromone. Family M48 peptidases 
also appear to be required for some prenylation reactions, mediating COOH-terminal CAAX 
processing. Prenylation reactions are believed to be involved in the regulation of protein- 
1 5 protein and protein-membrane interactions. As an example, RAS GTPase activity is 

regulated in part by localization to the inner side of the plasma membrane upon prenylation. 
In plants, proteins from this family could be involved in pollen-stigma interactions such as 
those mediating self-pollenation vs. outcrossing, or could be members of several secondary 
metabolism pathways. 

20 

Ref: Fujimura-Kamada et al. (1997) J Cell Biol. 136: 271-85. Tarn et al. (1998) J Cell 
Biol. 142: 635-49. 

Z. DNA Pol Viral N 

2 5 The DNA pol Viral N domain is located at the N-terminal region of DNA polymerase 

isolated from several retroid viruses such as the Cauliflower Mosaic Virus. The domain 
motif has also been found in numerous other species from humans to cyanobacteria. In these 
organisms, this motif seems to be associated with two types of sequences; retrotransposons 
and mitochondrial genes. In the mitochondrial sequences this domain is potentially involved 

30 in the self-splicing conducted by group II introns. Various manipulations of this gene in 
plants allows control of the numerous retrotransposons endogenous to plant genomes or 
allows engineering of mitochondrial function, especially to increase efficiency of energy 
utilization by cells. 
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REF: Chapdelaine and Bonen (1991) Cell 65: 465-72 
Ferat and Miche (1993) Nature 364: 358-61 
Wilson et aL (1994) 368: 32-8 
5 Cambareri et al. (1994) 242: 658-65 

Gaardner et al. (1981) NAR 9: 2871-2888 
Cummings et aL (1990) Curr Genet 17: 375-402 
Hattori et al. (1986) Nature 321: 625-8 

10 Aa. Calpain inhib 

This domain is found in calpastatin, an inhibitor protein specific for calpain. Calpain 
is a non-lysosomal calcium-dependent intracellular protease that appears to be involved in 
the dynamic changes of the cytoskeleton, especially actin-related structures, during early 
Drosophila embryogenesis [1], Calpastatins co-exist in cells with calpains and the subcellular 

1 5 distribution of calpastatin is thought to be important to calpain regulation [2]. In plants 

calpains and calpastatins could be involved in embryogenesis and non-embryogenic organ 
reiteration. Mutations occurring in calpain inhibitor repeat domains would produce 
developmental abnormalities such as abnormal leaf, root or flower development. 

2 0 Refs 

1 Emori Y and Saigo K (1994) J Biol Chem 269: 25 137-42. 

2 Mellgren RL, Lane RD, Mericle MT (1989) Biochim Biophys Acta 999: 71-77. 

Ab. chorismate bind 

2 5 Chorismate binding domains are present in plant anthranilate synthase (AS) genes. AS 

genes catalyze the first step in the biosynthesis of tryptophan by converting chorismate and 
L-glutamine to anthranilate, pyruvate and L-glutamate. Some of these genes are involved in 
feedback inhibition by tryptophan [1] while some are feedback insensitive [2]. In 
Arabidopsis, two AS genes have overlapping, but different distributions. One of these AS 

30 genes is induced by wounding and bacterial pathogen infiltration [1]. Mutations in the 

chorismate binding domain would affect the production of tryptophan and could influence the 
plant's defense system. AS gene products can be used for in vitro synthesis of tryptophan 
and tryptophan derivatives. 
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Refs 

1 Niyogi KK, Fink GR (1992) Plant Cell 4: 721-33. 

2 Song HS, Brotherton JE, Gonzales RA, Wilholm JM (1998) Plant Physiol 117:533- 
5 43. 

Ac . late protein L2 
Papillomaviruses are encapsulated double stranded DNA viruses. Plants are susceptible to 
infection by double stranded DNA viruses such as Cauliflower Mosaic virus (CaMV). The 
1 0 coat proteins in these plant viruses are critical to the virus life cycle within the plant. For 
example, the coat protein of CaMV is thought to be involved in intra- and inter-cellular 
movement within the plant [1]. Engineering of proteins having similarity to papillomavirus 
coat proteins may enable the production of plants having better resistance to natural plant 
double stranded DNA viruses. 

15 

Refs 

1 Thompson SR, Melcher U (1993) J Gen Virol 74: 1 141-8. 
Ad. Peptidase M41 

2 0 Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor 

and are integral membrane proteins. They seem to be involved in the degradation of carboxy- 
terminal-tagged cytoplasmic proteins. In plants, these proteins are located in the thylakoid 
membranes of the chloroplasts, their expression is light regulated and they are thought to be 
involved in degradation of soluble stromal proteins and turn-over of thylkoid proteins [1]. 

2 5 Manipulation of expression and structure of these proteins would have effects on the 
efficiency of photosynthesis and the development of chloroplasts. 

Refs 

1 Lindahl M, Tabak s, Cseke L, Pichersky E, Andersson B, Adam Z (1996) J Biol 
30 Chem 271: 29329-34. 



Ae. UPF0051 
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There is some evidence that, in plants, proteins in this family are involved in ATP synthesis 
in chloroplasts [1,2]. Mutations in these proteins or altering their expression would affect 
the efficiency of photosynthesis and energy production. 

5 Refs 

1 Kostrzewa M, Zetsche K (1992) J Mol Biol 227: 961-70. 

2 Kostrzewa M, Zetsche K (1993) Plant Mol Biol 23: 67-76 

M E7 

10 Papillomaviruses are encapsulated double stranded DNA viruses. The Papillomavirus early 
protein 7 (E7) is known as a potent immortalizing and transforming agent. Transformation by 
E7 is thought to be mediated by the physical association of E7 with cellular proteins 
regulating entry into the cell cycle [1]. The result is entry into the cell cycle and suppression 
of terminal differentiation in mammalian cells. Thus, engineering of proteins having 

1 5 similarity to papillomavirus E7 protein enables the production of plants having altered 
cellular proliferation characteristics and possibly altered morphology. For example, 
overexpression of E7-like proteins would be expected to result in proliferation of cells of the 
tissue in which the E7 protein is expressed, perhaps with suppression of differentiation 
events. Thus, for example, overexpression of E7-like proteins in meristem cells can result in 

2 0 taller plants and suppression of leafing and/or flowering. 

Refs 

1 Zwerschke W, Jansen-Durr P Adv Cancer Res 2000;78 : 1 -29 

25 Ag. Peptidase U7 

This protein is known to be an integral membrane protein in the cyanobacterium 
Synechocystis where it functions to digest cleaved signal peptides [1]. This activity is 
necessary to maintain proper secretion of mature proteins across the membrane. In higher 
plants this protein may be present in the plastid or chloroplast membranes where it would 

30 function by enabling protein movement into and out of the chloroplasts. Mutations in this 
protein would be expected to affect the development of plastids, including chloroplasts, or 
alter the energy transfer system within the chloroplasts, thereby affecting growth and 
development. 
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Refs 

1 Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, 

Hirosawa M, Sugiura M, Sasamoto S, Kimura T, Hosouchi T, Matsuno A, Muraki A, 
Nakazaki N, Naruo K, Okumura S, Shimpo S, Takeuchi C, Wada T, Watanabe A, 
Yamada M, Yasuda M, Tabata S (1996) DNA Res 3:109-36. 



Ah. 5 '-3' Exonuc lease 

The 5'-3' exonuclease domain is one found in bacterial DNA polymerases I and in yeast DNA 
repair enzymes such as Exonuclease I. Yeast Exo I is involved in mitotic recombination and 

10 also includes a domain that interacts with the mismatch repair protein MSH2. The 5'-3' 
exonuclease domain is also present in XPG DNA repair enzymes in humans and in yeast 
RAD9 protein. Defects in XPG proteins result in Xeroderma Pigmentosum. Thus defects in 
5'-3' exonuclease domain-containing proteins in plants are expected to lead to defects in DNA 
repair and corresponding high spontaneous and inducible mutation rates. Consensus sequence 

15 (SEC ID NO: 769) : 

1MKKKLLLVDGSSLAFRAFFALPPLTNSAGEPTNAVYGFLKMLIKLIEQEQPTHIAVV 
FDAKAKTFRHELYEGYKAGRAP 

TPDELREQIPLIKELLDALGIPLLEVAGYEADDVIGTLAKLAEKEGYEVLIVTGDRDLL 
20 QL V S DH VTV IITKKGI AEFTL 

FTPEAVIEKYGLTPEQIIDYKALMGDSSDNIPGVKGIGEKTAAKLLQEYGSLEGIYANL 

DKLKGKKLREKLLAHKEDAKL 

SRDLATIKTDVPLDLTLDDLRLPDPDRDALDLLFDE 



25 Ref: 

Fiorentini P. et al. RT. Mol. Cell. Biol. 17:2764-2773(1997). 

Tishkoff et al. Cancer Res. 0:0-0(1998). 

Macinnes M.A. et al. Mol. Cell. Biol. 13:6393-6402(1993). 
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AA. Activities of Pol vpeptides Comprising Signal Peptides 

Polypeptides comprising signal peptides are a family of proteins that are typically 
targeted to (1) a particular organelle or intracellular compartment, (2) interact with a 
5 particular molecule or (3) for secretion outside of a host cell. Example of polypeptides 

comprising signal peptides include, without limitation, secreted proteins, soluble proteins, 
receptors, proteins retained in the ER, etc. 

These proteins comprising signal peptides are useful to modulate ligand-receptor 
10 interactions, cell-to-cell communication, signal transduction, intracellular communication, 

and activities and/or chemical cascades that take part in an organism outside or within of any 
particular cell. 

One class of such proteins are soluble proteins which are transported out of the cell. 
1 5 These proteins can act as ligands that bind to receptor to trigger signal transduction or to 
permit communication between cells. 

Another class is receptor proteins which also comprise a retention domain that lodges 
the receptor protein in the membrane when the cell transports the receptor to the surface of 
2 0 the cell. Like the soluble ligands, receptors can also modulate signal transduction and 
communication between cells. 

In addition the signal peptide itself can serve as a ligand for some receptors. An 
example is the interaction of the ER targeting signal peptide with the signal recognition 
2 5 particle (SRP). Here, the SRP binds to the signal peptide, halting translation, and the 
resulting SRP complex then binds to docking proteins located on the surface of the ER, 
prompting transfer of the protein into the ER. 



30 



A description of signal peptide residue composition is described below in Subsection 

IV.C.1. 
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III, Methods of Modulating Polypeptide Production 

It is contemplated that polynucleotides of the invention can be incorporated into a 
host cell or in-vitro system to modulate polypeptide production. For instance, the SDFs 
prepared as described herein can be used to prepare expression cassettes useful in a number of 
5 techniques for suppressing or enhancing expression. 

An example are polynucleotides comprising sequences to be transcribed, such as 
coding sequences, of the present invention can be inserted into nucleic acid constructs to 
modulate polypeptide production. Typically, such sequences to be transcribed are 
heterologous to at least one element of the nucleic acid construct to generate a chimeric gene 
10 or construct. 

Another example of useful polynucleotides are nucleic acid molecules comprising 
regulatory sequences of the present invention. Chimeric genes or constructs can be generated 
when the regulatory sequences of the invention linked to heterologous sequences in a vector 
construct. Within the scope of invention are such chimeric gene and/or constructs. 
1 5 Also within the scope of the invention are nucleic acid molecules, whereof at least a part 

or fragment of these DNA molecules are presented in TABLE 1 of the present application, and 
wherein the coding sequence is under the control of its own promoter and/or its own regulatory 
elements. Such molecules are useful for transforming the genome of a host cell or an organism 
regenerated from said host cell for modulating polypeptide production. 
2 0 Additionally, a vector capable of producing the oligonucleotide can be inserted into the 

host cell to deliver the oligonucleotide. 

More detailed description of components to be included in vector constructs are 
described both above and below. 

Whether the chimeric vectors or native nucleic acids are utilized, such 

2 5 polynucleotides can be incorporated into a host cell to modulate polypeptide production. 

Native genes and/or nucleic acid molecules can be effective when exogenous to the host cell. 
Methods of modulating polypeptide expression includes, without limitation: 
Suppression methods, such as 
Antisense 

3 0 Ribozymes 

Co-suppression 

Insertion of Sequences into the Gene to be Modulated 
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Regulatory Sequence Modulation. 

as well as Methods for Enhancing Production, such as 
Insertion of Exogenous Sequences; and 
Regulatory Sequence Modulation. 

III.A. Suppression 

Expression cassettes of the invention can be used to suppress expression of 
endogenous genes which comprise the SDF sequence. Inhibiting expression can be useful, 
for instance, to tailor the ripening characteristics of a fruit (Oeller et al., Science 254 :437 
(1991)) or to influence seed size_(WO98/07842) or to provoke cell ablation (Mariani et al., 
Nature 357: 384-387 (1992). 

As described above, a number of methods can be used to inhibit gene expression in 
plants, such as antisense, ribozyme, introduction of exogenous genes into a host cell, 
insertion of a polynucleotide sequence into the coding sequence and/or the promoter of the 
endogenous gene of interest, and the like. 

III. A. 1. Antisense 

An expression cassette as described above can be transformed into host cell or 
plant to produce an antisense strand of RNA. For plant cells, antisense RNA inhibits gene 
expression by preventing the accumulation of mRNA which encodes the enzyme of interest, see, 
e.g., Sheehy et al., Proc. Nat Acad Sci. USA, 85:8805 (1988), and Hiatt et al., U.S. Patent No. 
4,801,340. 

III.A.2. Ribozymes 

Similarly, ribozyme constructs can be transformed into a plant to cleave mRNA 
and down-regulate translation. 

III. A.3. Co-Suppression 

Another method of suppression is by introducing an exogenous copy of the gene 
to be suppressed. Introduction of expression cassettes in which a nucleic acid is configured in 
the sense orientation with respect to the promoter has been shown to prevent the accumulation of 
mRNA. A detailed description of this method is described above. 



Reference No. 2750-942P 



861 

III.A.4. Insertion of Sequences into the Gene to be Modulated 

Yet another means of suppressing gene expression is to insert a polynucleotide 
into the gene of interest to disrupt transcription or translation of the gene. 

Homologous recombination could be used to target a polynucleotide insert to a 
5 gene using the Cre-Lox system (A.C. Vergunst et al., Nucleic Acids Res. 26:2729 (1 998), A.C. 
Vergunst et al., Plant MoL Biol 38:393 (1998), H. Albert et al., Plant J. 7:649 (1995)). 

In addition, random insertion of polynucleotides into a host cell genome can also 
be used to disrupt the gene of interest. Azpiroz-Leehan et al., Trends in Genetics H:152 (1997). 
In this method, screening for clones from a library containing random insertions is preferred for 
1 0 identifying those that have polynucleotides inserted into the gene of interest. Such screening can 
be performed using probes and/or primers described above based on sequences from TABLE 1, 
fragments thereof, and substantially similar sequence thereto. The screening can also be 
performed by selecting clones or any transgenic plants having a desired phenotype. 

III.A.5. Regulatory SequenceModulation 
1 5 The SDFs described in Table 1 , and fragments thereof are examples of 

nucleotides of the invention that contain regulatory sequences that can be used to suppress or 
inactivate transcription and/or translation from a gene of interest as discussed in LC.5. 



III. A.6. Genes Comprising Dominant-Negative Mutations 
2 0 When suppression of production of the endogenous, native protein is desired it 

is often helpful to express a gene comprising a dominant negative mutation. Production of 
protein variants produced from genes comprising dominant negative mutations is a useful 
tool for research Genes comprising dominant negative mutations can produce a variant 
polypeptide which is capable of competing with the native polypeptide, but which does not 

2 5 produce the native result. Consequently, over expression of genes comprising these mutations 

can titrate out an undesired activity of the native protein. For example, The product from a 
gene comprising a dominant negative mutation of a receptor can be used to constitutively 
activate or suppress a signal transduction cascade, allowing examination of the phenotype 
and thus the trait(s) controlled by that receptor and pathway. Alternatively, the protein arising 

3 0 from the gene comprising a dominant-negative mutation can be an inactive enzyme still capable 
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of binding to the same substrate as the native protein and therefore competes with such native 
protein. 

Products from genes comprising dominant-negative mutations can also act upon 
the native protein itself to prevent activity. For example, the native protein may be active only 
as a homo-multimer or as one subunit of a hetero-multimer. Incorporation of an inactive subunit 
into the multimer with native subunit(s) can inhibit activity. 

Thus, gene function can be modulated in host cells of interest by insertion into 
these cells vector constructs comprising a gene comprising a dominant-negative mutation. 

III.B. Enhanced Expression 

Enhanced expression of a gene of interest in a host cell can be accomplished by either 
(1) insertion of an exogenous gene; or (2) promoter modulation. 

III.B. 1 . Insertion of an Exogenous Gene 

Insertion of an expression construct encoding an exogenous gene can boost the 
number of gene copies expressed in a host cell. 

Such expression constructs can comprise genes that either encode the native 
protein that is of interest or that encode a variant that exhibits enhanced activity as compared to 
the native protein. Such genes encoding proteins of interest can be constructed from the 
sequences from TABLE 1, fragments thereof, and substantially similar sequence thereto. 

Such an exogenous gene can include either a constitutive promoter permitting 
expression in any cell in a host organism or a promoter that directs transcription only in 
particular cells or times during a host cell life cycle or in response to environmental stimuli. 

III.B.2. Regulatory Sequence Modulation 

The SDFs of Table 1, and fragments thereof, contain regulatory sequences that 
can be used to enhance expression of a gene of interest. For example, some of these sequences 
contain useful enhancer elements. In some cases, duplication of enhancer elements or insertion 
of exogenous enhancer elements will increase expression of a desired gene from a particular 
promoter. As other examples, all 11 promoters require binding of a regulatory protein to be 
activated, while some promoters may need a protein that signals a promoter binding protein to 
expose a polymerase binding site. In either case, over-production of such proteins can be used 
to enhance expression of a gene of interest by increasing the activation time of the promoter. 
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Such regulatory proteins are encoded by some of the sequences in TABLE 1, 
fragments thereof, and substantially similar sequences thereto. 

Coding sequences for these proteins can be constructed as described above. 



IV. Gene Constructs and Vector Construction 

To use isolated SDFs of the present invention or a combination of them or parts and/or 
mutants and/or fusions of said SDFs in the above techniques, recombinant DNA vectors which 
comprise said SDFs and are suitable for transformation of cells, such as plant cells, are usually 
prepared. The SDF construct can be made using standard recombinant DNA techniques 
(Sambrook et al. 1989) and can be introduced to the species of interest by Agrobacterium- 
mediated transformation or by other means of transformation (e.g., particle gun 
bombardment) as referenced below. 

The vector backbone can be any of those typical in the art such as plasmids, viruses, 
artificial chromosomes, BACs, YACs and PACs and vectors of the sort described by 

(a) BAC: Shizuya et al., Proc. Natl. Acad. Sci. USA 89: 8794-8797 (1992); 
Hamilton et al., Proc. Natl. Acad. Sci. USA 93: 9975-9979 (1996); 

(b) YAC: Burke et al., Science 236:806-812 (1987);. 

(c) PAC: Sternberg N. et al., Proc Natl Acad Sci USA. Jan;87(l):103-7 (1990); 

(d) Bacteria- Yeast Shuttle Vectors: Bradshaw et al., Nucl Acids Res 23 : 4850- 
4856 (1995); 

(e) Lambda Phage Vectors: Replacement Vector, e.g., 
Frischauf et al., J. Mol Biol 170: 827-842 (1983); or Insertion vector, e.g., 

Huynh et al., In: Glover NM (ed) DNA Cloning: A practical Approach, Vol.1 Oxford: IRL 
Press (1985); 

(f) T-DNA gene fusion vectors : Walden et al., Mol Cell Biol 1 : 1 75-1 94 (1 990); 
and 

(g) Plasmid vectors: Sambrook et al., infra. 

Typically, a vector will comprise the exogenous gene, which in its turn comprises an 
SDF of the present invention to be introduced into the genome of a host cell, and which gene 
may be an antisense construct, a ribozyme construct chimeraplast, or a coding sequence with 
any desired transcriptional and/or translational regulatory sequences, such as promoters, UTRs, 
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and 3' end termination sequences. Vectors of the invention can also include origins of 
replication, scaffold attachment regions (SARs), markers, homologous sequences, introns, etc. 

A DNA sequence coding for the desired polypeptide, for example a cDNA sequence 
encoding a full length protein, will preferably be combined with transcriptional and translational 
5 initiation regulatory sequences which will direct the transcription of the sequence from the gene 
in the intended tissues of the transformed plant. 

For example, for over-expression, a plant promoter fragment may be employed that will 
direct transcription of the gene in all tissues of a regenerated plant. Alternatively, the plant 
promoter may direct transcription of an SDF of the invention in a specific tissue (tissue-specific 
1 0 promoters) or may be otherwise under more precise environmental control (inducible 
promoters). 

If proper polypeptide productions desired, a polyadenylation region at the 3 '-end of the 
coding region is typically included. The polyadenylation region can be derived from the natural 
gene, from a variety of other plant genes, or from T-DNA. 

15 The vector comprising the sequences from genes or SDF or the invention may 

comprise a marker gene that confers a selectable phenotype on plant cells. The vector can 
include promoter and coding sequence, for instance. For example, the marker may encode 
biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, G418, 
bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosulfuron or 

2 0 phosphinotricin. 

IV.A. Coding Sequences 

Generally, the sequence in the transformation vector and to be introduced into 
the genome of the host cell does not need to be absolutely identical to an SDF of the present 
invention. Also, it is not necessary for it to be full length, relative to either the primary 
2 5 transcription product or fully processed mRNA. Furthermore, the introduced sequence need not 
have the same intron or exon pattern as a native gene. Also, heterologous non-coding segments 
can be incorporated into the coding sequence without changing the desired amino acid sequence 
of the polypeptide to be produced. 
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IV.B. Promoters 

As explained above, introducing an exogenous SDF from the same species or an 
orthologous SDF from another species can modulate the expression of a native gene 
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corresponding to that SDF of interest. Such an SDF construct can be under the control of 
either a constitutive promoter or a highly regulated inducible promoter (e.g., a copper 
inducible promoter). The promoter of interest can initially be either endogenous or 
heterologous to the species in question. When re-introduced into the genome of said species, 
such promoter becomes exogenous to said species. Over-expression of an SDF transgene can 
lead to co-suppression of the homologous endogeneous sequence thereby creating some 
alterations in the phenotypes of the transformed species as demonstrated by similar analysis 
of the chalcone synthase gene (Napoli et al., Plant Cell 2:279 (1990) and van der Krol et al., 
Plant Cell 2:291 (1990)). If an SDF is found to encode a protein with desirable 
characteristics, its over-production can be controlled so that its accumulation can be 
manipulated in an organ- or tissue-specific manner utilizing a promoter having such 
specificity. 

Likewise, if the promoter of an SDF (or an SDF that includes a promoter) is found to 
be tissue-specific or developmentally regulated, such a promoter can be utilized to drive or 
facilitate the transcription of a specific gene of interest (e.g., seed storage protein or root- 
specific protein). Thus, the level of accumulation of a particular protein can be manipulated 
or its spatial localization in an organ- or tissue- specific manner can be altered. 

TV. C Signal Peptides 

SDFs of the present invention containing signal peptides are indicated in Table 1. In 
some cases it may be desirable for the protein encoded by an introduced exogenous or 
orthologous SDF to be targeted (1) to a particular organelle intracellular compartment, (2) to 
interact with a particular molecule such as a membrane molecule or (3) for secretion outside 
of the cell harboring the introduced SDF. This will be accomplished using a signal peptide. 

Signal peptides direct protein targeting, are involved in ligand-receptor interactions 
and act in cell to cell communication. Many proteins, especially soluble proteins, contain a 
signal peptide that targets the protein to one of several different intracellular compartments. 
In plants, these compartments include, but are not limited to, the endoplasmic reticulum (ER), 
mitochondria, plastids (such as chloroplasts), the vacuole, the Golgi apparatus, protein 
storage vessicles (PSV) and, in general, membranes. Some signal peptide sequences are 
conserved, such as the Asn-Pro-Ile-Arg amino acid motif found in the N-terminal propeptide 
signal that targets proteins to the vacuole (Marty (1999) The Plant Cell 1 1 : 587-599). Other 
signal peptides do not have a consensus sequence per se, but are largely composed of 
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hydrophobic amino acids, such as those signal peptides targeting proteins to the ER (Vitale 
and Denecke (1999) The Plant Cell 1 1 : 615-628). Still others do not appear to contain either 
a consensus sequence or an identified common secondary sequence, for instance the 
chloroplast stromal targeting signal peptides (Keegstra and Cline (1999) The Plant Cell 11: 
557-570). Furthermore, some targeting peptides are bipartite, directing proteins first to an 
organelle and then to a membrane within the organelle (e.g. within the thylakoid lumen of the 
chloroplast; see Keegstra and Cline (1999) The Plant Cell 11: 557-570). In addition to the 
diversity in sequence and secondary structure, placement of the signal peptide is also varied. 
Proteins destined for the vacuole, for example, have targeting signal peptides found at the N- 
terminus, at the C-terminus and at a surface location in mature, folded proteins. Signal 
peptides also serve as ligands for some receptors. 

These characteristics of signal proteins can be used to more tightly control the 
phenotypic expression of introduced SDFs. In particular, associating the appropriate signal 
sequence with a specific 3DF can allow sequestering of the protein in specific organelles 
(plastids, as an example), secretion outside of the cell, targeting interaction with particular 
receptors, etc. Hence, the inclusion of signal proteins in constructs involving the SDFs of the 
invention increases the range of manipulation of SDF phenotypic expression. The nucleotide 
sequence of the signal peptide can be isolated from characterized genes using common 
molecular biological techniques or can be synthesized in vitro. 

In addition, the native signal peptide sequences, both amino acid and nucleotide, 
described in Table 1 can be used to modulate polypeptide transport. Further variants of the 
native signal peptides described in Table 1 are contemplated. Insertions, deletions, or 
substitutions can be made. Such variants will retain at least one of the functions of the native 
signal peptide as well as exhibiting some degree of sequence identity to the native sequence. 

Also, fragments of the signal peptides of the invention are useful and can be fused with 
other signal peptides of interest to modulate transport of a polypeptide. 

V. Transformation Techniques 

A wide range of techniques for inserting exogenous polynucleotides are known for a 
number of host cells, including, without limitation, bacterial, yeast, mammalian, insect and plant 
cells. 
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Techniques for transforming a wide variety of higher plant species are well known and 
described in the technical and scientific literature. See, e.g. Weising et al., Ann, Rev. Genet. 
22:421 (1988); and Christou, Euphytica, v. 85, n.l-3:13-27, (1995). 

DNA constructs of the invention may be introduced into the genome of the desired plant 
host by a variety of conventional techniques. For example, the DNA construct may be 
introduced directly into the genomic DNA of the plant cell using techniques such as 
electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be 
introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment. 
Alternatively, the DNA constructs may be combined with suitable T-DNA flanking regions and 
introduced into a conventional Agrobacterium tumefaciens host vector. The virulence functions 
of the Agrobacterium tumefaciens host will direct the insertion of the construct and adjacent 
marker into the plant cell DNA when the cell is infected by the bacteria (McCormac et al., Mol. 
Biotechnol. 8:199 (1997); Hamilton, Gene 200:107 (1997)); Salomon et al. EMBOJ. 3:141 
(1984); Herrera-Estrella et al. EMBOJ. 2:987 (1983). 

Microinjection techniques are known in the art and well described in the scientific and 
patent literature. The introduction of DNA constructs using polyethylene glycol precipitation is 
described in Paszkowski et al. EMBOJ. 3:2717 (1984). Electroporation techniques are 
described in Fromm et al. Proc. Natl Acad Sci. USA 82:5824 (1985). Ballistic transformation 
techniques are described in Klein et al. Nature 327:773 (1987). Agrobacterium 
tumefaciens-medieLted transformation techniques, including disarming and use of binary or co- 
integrate vectors, are well described in the scientific literature. See, for example Hamilton, CM, 
Gene 200:107 (1997); Muller et al. Mol. Gen. Genet. 207:171 (1987); Komari et al. Plant J 
10: 165 (1996); Venkateswarlu et al. Biotechnology 9:1103 (1991) and Gleave, AP. , Plant Mol. 
Biol. 20:1203 (1992); Graves and Goldman, Plant Mol. Biol 7:34 (1986) and Gould et al., Plant 
Physiology 95:426 (1991). 

Transformed plant cells which are derived by any of the above transformation 
techniques can be cultured to regenerate a whole plant that possesses the transformed genotype 
and thus the desired phenotype such as seedlessness. Such regeneration techniques rely on 
manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a 
biocide and/or herbicide marker which has been introduced together with the desired nucleotide 
sequences. Plant regeneration from cultured protoplasts is described in Evans et al., Protoplasts 
Isolation and Culture in "Handbook of Plant Cell Culture," pp. 124-176, MacMillan Publishing 
Company, New York, 1983; and Binding, Regeneration of Plants, Plant Protoplasts, pp. 21-73, 
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CRC Press, Boca Raton, 1988. Regeneration can also be obtained from plant callus, explants, 
organs, or parts thereof. Such regeneration techniques are described generally in Klee et al. Ann. 
Rev. of Plant Phys. 38:467 (1987). Regeneration of monocots (rice) is described by Hosoyama 
et al. (Biosci Biotechnol Biochem. 58:1500 (1994)) and by Ghosh et al. (J. Biotechnol 32:1 
(1994)). The nucleic acids of the invention can be used to confer desired traits on essentially any 
plant. 

Thus, the invention has use over a broad range of plants, including species from the 
genera Anacardium, Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citrullus, Capsicum, 
Carthamus, Cocos, Coffea, Cucumis, Cucurbita } Daucus, Elaeis, Fragaria, Glycine, Gossypium, 
Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium,Lupinus, 
Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Olea, Oryza, Panieum, 
Pannesetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, 
Senecio, Sinapis, Solanum, Sorghum, Theobromus, Trigonella, Triticum, Vicia, Vitis, Vigna, 
and, Zea. 

One of skill will recognize that after the expression cassette is stably incorporated in 
transgenic plants and confirmed to be operable, it can be introduced into other plants by 
sexual crossing. Any of a number of standard breeding techniques can be used, depending 
upon the species to be crossed. 

The particular sequences of SDFs identified are provided in the attached TABLE 1 . 
One of ordinary skill in the art, having this data, can obtain cloned DNA fragments, synthetic 
DNA fragments or polypeptides constituting desired sequences by recombinant methodology 
known in the art or described herein. 

EXAMPLES 

The invention is illustrated by way of the following examples. The invention is not 
limited by these examples as the scope of the invention is defined solely by the claims 
following. 

EXAMPLE 1: cDNA PREPARATION 

A number of the nucleotide sequences disclosed in TABLE 1 herein as representative of 
the SDFs of the invention can be obtained by sequencing genomic DNA (gDNA) and/or cDNA 
from corn plants grown from HYBRID SEED # 35A19, purchased from Pioneer Hi-Bred 
International, Inc., Supply Management, P.O. Box 256, Johnston, Iowa 50131-0256. 
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A number of the nucleotide sequences disclosed in TABLE 1 herein as representative 
of the SDFs of the invention can also be obtained by sequencing genomic DNA from 
Arabidopsis thaliana, Wassilewskija ecotype or by sequencing cDNA obtained from mRNA 
from such plants as described below. This is a true breeding strain. Seeds of the plant are 
5 available from the Arabidopsis Biological Resource Center at the Ohio State University, 

under the accession number CS2360. Seeds of this plant were deposited under the terms and 
conditions of the Budapest Treaty at the American Type Culture Collection, Manassas, VA 
on August 31, 1999, and were assigned ATCC No. PTA-595. 

Other methods for cloning full-length cDNA are described, for example, by Seki et 
10 al., Plant Journal 15 : 707-720 (1 998) "High-efficiency cloning of Arabidopsis full-length 

cDNA by biotinylated Cap trapper"; Maruyama et al., Gene 138:171 (1994) "Oligo-capping a 
simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides"; 
and WO 96/34981. 

Tissues were, or each organ was, individually pulverized and frozen in liquid 
15 nitrogen. Next, the samples were homogenized in the presence of detergents and then 

centrifuged. The debris and nuclei were removed from the sample and more detergents were 
added to the sample. The sample was centrifuged and the debris was removed. Then the 
sample was applied to a 2M sucrose cushion to isolate polysomes. The RNA was isolated by 
treatment with detergents and proteinase K followed by ethanol precipitation and 
2 0 centrifiigation. The polysomal RNA from the different tissues was pooled according to the 
following mass ratios: 15/15/1 for male inflorescences, female inflorescences and root, 
respectively. The pooled material was then used for cDNA synthesis by the methods 
described below. 

Starting material for cDNA synthesis for the exemplary corn cDNA clones 
2 5 with sequences presented in TABLE 1 was poly(A)-containing polysomal mRNAs from 

inflorescences and root tissues of corn plants grown from HYBRID SEED # 35A19. Male 
inflorescences and female (pre-and post-fertilization) inflorescences were isolated at various 
stages of development. Selection for poly(A) containing polysomal RNA was done using 
oligo d(T) cellulose columns, as described by Cox and Goldberg, "Plant Molecular Biology: 
30 A Practical Approach", pp. 1-35, Shaw ed., c. 1988 by IRL, Oxford. The quality and the 
integrity of the polyA+ RNAs were evaluated. 
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Starting material for cDNA synthesis for the exemplary Arabidopsis cDNA 
clones with sequences presented in TABLE 1 was polysomal RNA isolated from the top- 
most inflorescence tissues of Arabidopsis thaliana Wassilewskija (Ws.) and from roots of 
Arabidopsis thaliana Landsberg erecta (L. er.), also obtained from the Arabidopsis 
Biological Resource Center. Nine parts inflorescence to every part root was used, as 
measured by wet mass. Tissue was pulverized and exposed to liquid nitrogen. Next, the 
sample was homogenized in the presence of detergents and then centrifuged. The debris and 
nuclei were removed from the sample and more detergents were added to the sample. The 
sample was centrifuged and the debris was removed and the sample was applied to a 2M 
sucrose cushion to isolate polysomal RNA. Cox et al., "Plant Molecular Biology: A Practical 
Approach", pp. 1-35, Shaw ed., c. 1988 by IRL, Oxford. The polysomal RNA was used 
for cDNA synthesis by the methods described below. Polysomal mRNA was then isolated as 
described above for corn cDNA. The quality of the RNA was assessed electrophoretically. 

Following preparation of the mRNAs from various tissues as described above, selection 
of mRNA with intact 5 5 ends and specific attachment of an oligonucleotide tag to the 5' end of 
such mRNA was performed using either a chemical or enzymatic approach. Both techniques 
take advantage of the presence of the "cap" structure, which characterizes the 5' end of most 
intact mRNAs and which comprises a guanosine generally methylated once, at the 7 position. 

The chemical modification approach involves the optional elimination of the 2\ 3'-cis 
diol of the 3 ? terminal ribose, the oxidation of the 2\ 3'-cis diol of the ribose linked to the cap of 
the 5' ends of the mRNAs into a dialdehyde, and the coupling of the such obtained dialdehyde to 
a derivatized oligonucleotide tag. Further detail regarding the chemical approaches for 
obtaining mRNAs having intact 5' ends are disclosed in International Application No. 
W096/34981 published November 7, 1996. 

The enzymatic approach for ligating the oligonucleotide tag to the intact 5' ends of 
mRNAs involves the removal of the phosphate groups present on the 5 5 ends of uncapped 
incomplete mRNAs, the subsequent decapping of mRNAs having intact 5' ends and the ligation 
of the phosphate present at the 5' end of the decapped mRNA to an oligonucleotide tag. Further 
detail regarding the enzymatic approaches for obtaining mRNAs having intact 5' ends are 
disclosed in Dumas Milne Edwards J.B. (Doctoral Thesis of Paris VI University, Le clonage des 
ADNc complets: difficultes et perspectives nouvelles. Apports pour l'etude de la regulation de 
l'expression de la tryptophane hydroxylase de rat, 20 Dec. 1993), EPO 625572 and Kato et al, 
Gene 150:243-250(1994). 
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In both the chemical and the enzymatic approach, the oligonucleotide tag has a 
restriction enzyme site (e.g. an EcoRI site) therein to facilitate later cloning procedures. 
Following attachment of the oligonucleotide tag to the mRNA, the integrity of the mRNA is 
examined by performing a Northern blot using a probe complementary to the oligonucleotide 
5 tag. 

For the mRN As joined to oligonucleotide tags using either the chemical or the enzymatic 
method, first strand cDNA synthesis is performed using an oligo-dT primer with reverse 
transcriptase. This oligo-dT primer can contain an internal tag of at least 4 nucleotides, which 
can be different from one mRNA preparation to another. Methylated dCTP is used for cDNA 
1 0 first strand synthesis to protect the internal EcoRI sites from digestion during subsequent steps. 
The first strand cDNA is precipitated using isopropanol after removal of RNA by alkaline 
hydrolysis to eliminate residual primers. 

Second strand cDNA synthesis is conducted using a DNA polymerase, such as Klenow 
fragment and a primer corresponding to the 5' end of the ligated oligonucleotide. The primer is 
1 5 typically 20-25 bases in length. Methylated dCTP is used for second strand synthesis in order to 
protect internal EcoRI sites in the cDNA from digestion during the cloning process. 

Following second strand synthesis, the full-length cDNAs are cloned into a phagemid 
vector, such as pBlueScript™ (Stratagene). The ends of the full-length cDNAs are blunted with 
T4 DNA polymerase (Biolabs) and the cDNA is digested with EcoRI. Since methylated dCTP 
20 is used during cDNA synthesis, the EcoRI site present in the tag is the only hemi-methylated 
site; hence the only site susceptible to EcoRI digestion. In some instances, to facilitate 
subcloning, an Hind III adapter is added to the 3' end of full-length cDNAs. 

The full-length cDNAs are then size fractionated using either exclusion chromatography 
(AcA, Biosepra) or electrophoretic separation which yields 3 to 6 different fractions. The full- 

2 5 length cDNAs are then directionally cloned either into pBlueScript™ using either the EcoRI and 

Smal restriction sites or, when the Hind III adapter is present in the full-length cDNAs, the 
EcoRI and Hind III restriction sites. The ligation mixture is transformed, preferably by 
electroporation, into bacteria, which are then propagated under appropriate antibiotic selection. 
Clones containing the oligonucleotide tag attached to full-length cDNAs are selected as 

3 0 follows. 

The plasmid cDNA libraries made as described above are purified (e.g. by a column 
available from Qiagen). A positive selection of the tagged clones is performed as follows. 
Briefly, in this selection procedure, the plasmid DNA is converted to single stranded DNA using 
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phage Fl gene II endonuclease in combination with an exonuclease (Chang et aL, Gene 127:95 
(1993)) such as exonuclease III or T7 gene 6 exonuclease. The resulting single stranded DNA is 
then purified using paramagnetic beads as described by Fry et aL, Bio techniques 13: 124 (1992). 
Here the single stranded DNA is hybridized with a biotinylated oligonucleotide having a 
sequence corresponding to the 3' end of the oligonucleotide tag. Preferably, the primer has a 
length of 20-25 bases. Clones including a sequence complementary to the biotinylated 
oligonucleotide are selected by incubation with streptavidin coated magnetic beads followed by 
magnetic capture. After capture of the positive clones, the plasmid DNA is released from the 
magnetic beads and converted into double stranded DNA using a DNA polymerase such as 
ThermoSequenase™ (obtained from Amersham Pharmacia Biotech). Alternatively, protocols 
such as the Gene Trapper™ kit (Gibco BRL) can be used. The double stranded DNA is then 
transformed, preferably by electroporation, into bacteria. The percentage of positive clones 
having the 5' tag oligonucleotide is typically estimated to be between 90 and 98% from dot blot 
analysis. 

Following transformation, the libraries are ordered in microtiter plates and sequenced. 
The Arabidopsis library was deposited at the American Type Culture Collection on January 
7, 2000 as "E-coli liba 010600" under the accession number PTA-1161 . 
EXAMPLE 2: SOUTHERN HYBRIDIZATIONS 

The SDFs of the invention can be used in Southern hybridizations as described above. 
The following describes extraction of DNA from nuclei of plant cells, digestion of the 
nuclear DNA and separation by length, transfer of the separated fragments to membranes, 
preparation of probes for hybridization, hybridization and detection of the hybridized probe. 

The procedures described herein can be used to isolate related polynucleotides or for 
diagnostic purposes. Moderate stringency hybridization conditions, as defined above, are 
described in the present example. These conditions result in detection of hybridization 
between sequences having at least 70% sequence identity. As described above, the 
hybridization and wash conditions can be changed to reflect the desired percenatge of 
sequence identity between probe and target sequences that can be detected. 

In the following procedure, a probe for hybridization is produced from two PCR 
reactions using two primers from genomic sequence of Arabidopsis thaliana. As described 
above, the particular template for generating the probe can be any desired template. 

The first PCR product is assessed to validate the size of the primer to assure it is of 
the expected size. Then the product of the first PCR is used as a template, with the same pair 
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of primers used in the first PCR, in a second PCR that produces a labeled product used as the 
probe. 

Fragments detected by hybridization, or other bands of interest, can be isolated from 
gels used to separate genomic DNA fragments by known methods for further purification 
5 and/or characterization. 



Buffers for nuclear DNA extraction 

1. 10XHB 





1000 ml 




40 mM spermidine 


10.2 g 


Spermine (Sigma S-2876) and spermidine (Sigma 
S-2501) 


10 mM spermine 


3.5 g 


Stabilize chromatin and the nuclear membrane 


0.1 MEDTA 
(disodium) 


37.2 g 


EDTA inhibits nuclease 


0.1 MTris 


12.1 g 


Buffer 


0.8 M KC1 


59.6 g 


Adjusts ionic strength for stability of nuclei 



Adjust pH to 9.5 with 10 N NaOH. It appears that there is a nuclease present in 
leaves. Use of pH 9.5 appears to inactivate this nuclease. 



10 2. 2 M sucrose (684 g per 1000 ml) 

Heat about half the final volume of water to about 50°C. Add the sucrose slowly then 
bring the mixture to close to final volume; stir constantly until it has dissolved. Bring 
the solution to volume. 

3. Sarkosyl solution (lyses nuclear membranes) 



15 



1000 ml 
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N-lauroyl sarcosine (Sarkosyl) 



0.1 MTris 



20.0 g 

12.1 g 



0.04 M EDTA (Disodium) 



14.9 g 



Adjust the pH to 9.5 after all the components are dissolved and bring up to the proper 
volume. 

4. 20% Triton X- 100 
80 ml Triton X-100 
320 ml lxHB (w/o p-ME and PMSF) 
Prepare in advance; Triton takes some time to dissolve 

A. Procedure 

1 . Prepare IX "H" buffer (keep ice-cold during use) 



1000 ml 



10X HB 



100 ml 



2 M sucrose 



250 ml a non-ionic osmoticum 



Water 



634 ml 



Added just before use: 



6-mercaptoethanol 



100 mM PMSF* 



10 ml a protease inhibitor; protects 

nuclear membrane proteins 

1 ml inactivates nuclease by reducing 



disulfide bonds 



*100 mM PMSF 



(phenyl methyl sulfonyl fluoride, Sigma P-7626) 
(add 0.0875 g to 5 ml 100% ethanol) 



2. 



Homogenize the tissue in a blender (use 300-400 ml of lxHB per blender). Be sure 
that you use 5-1 0 ml of HB buffer per gram of tissue. Blenders generate heat so be 
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sure to keep the homogenate cold. It is necessary to put the blenders in ice 
periodically. 

Add the 20% Triton X-100 (25 ml per liter of homogenate) and gently stir on ice for 
20 min. This lyses plastid, but not nuclear, membranes. 

Filter the tissue suspension through several nylon filters into an ice-cold beaker. The 
first filtration is through a 250-micron membrane; the second is through an 85-micron 
membrane; the third is through a 50-micron membrane; and the fourth is through a 
20-micron membrane. Use a large funnel to hold the filters. Filtration can be sped up 
by gently squeezing the liquid through the filters. 

Centrifuge the filtrate at 1200 x g for 20 min. at 4°C to pellet the nuclei. 

Discard the dark green supernatant. The pellet will have several layers to it. One is 
starch; it is white and gritty. The nuclei are gray and soft. In the early steps, there 
may be a dark green and somewhat viscous layer of chloroplasts. 

Wash the pellets in about 25 ml cold H buffer (with Triton X-100) and resuspend by 
swirling gently and pipetting. After the pellets are resuspended. 

Pellet the nuclei again at 1200 - 1300 x g. Discard the supernatant. 

Repeat the wash 3-4 times until the supernatant has changed from a dark green to a 
pale green. This usually happens after 3 or 4 resuspensions. At this point, the pellet 
is typically grayish white and very slippery. The Triton X-100 in these repeated steps 
helps to destroy the chloroplasts and mitochondria that contaminate the prep. 

Resuspend the nuclei for a final time in a total of 1 5 ml of H buffer and transfer the 
suspension to a sterile 125 ml Erlenmeyer flask. 

Add 15 ml, dropwise, cold 2% Sarkosyl, 0.1 M Tris, 0.04 M EDTA solution (pH 9.5) 
while swirling gently. This lyses the nuclei. The solution will become very viscous. 
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8. Add 30 grams of CsCl and gently swirl at room temperature until the CsCl is in 
solution. The mixture will be gray, white and viscous. 

9. Centrifuge the solution at 1 1 ,400 x g at 4°C for at least 30 min. The longer this spin 
is, the firmer the protein pellicle. 

10. The result is typically a clear green supernatant over a white pellet, and (perhaps) 
under a protein pellicle. Carefully remove the solution under the protein pellicle and 
above the pellet. Determine the density of the solution by weighing 1 ml of solution 
and add CsCl if necessary to bring to 1.57 g/ml. The solution contains dissolved 
solids (sucrose etc) and the refractive index alone will not be an accurate guide to 
CsCl concentration. 

1 1 . Add 20 jal of 1 0 mg/ml EtBr per ml of solution. 

12. Centrifuge at 1 84,000 x g for 16 to 20 hours in a fixed-angle rotor. 

13. Remove the dark red supernatant that is at the top of the tube with a plastic transfer 
pipette and discard. Carefully remove the DNA band with another transfer pipette. 
The DNA band is usually visible in room light; otherwise, use a long wave UV light 
to locate the band. 

14. Extract the ethidium bromide with isopropanol saturated with water and salt. Once 
the solution is clear, extract at least two more times to ensure that all of the EtBr is 
gone. Be very gentle, as it is very easy to shear the DNA at this step. This extraction 
may take a while because the DNA solution tends to be very viscous. If the solution 
is too viscous, dilute it with TE. 

15. Dialyze the DNA for at least two days against several changes (at least three times) of 
TE (10 mM Tris, ImM EDTA, pH 8) to remove the cesium chloride. 
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1 6. Remove the dialyzed DNA from the tubing. If the dialyzed DNA solution contains a 
lot of debris, centrifuge the DNA solution at least at 2500 x g for 10 min. and 
carefully transfer the clear supernatant to a new tube. Read the A260 concentration of 
the DNA. 

17. Assess the quality of the DNA by agarose gel electrophoresis (1% agarose gel) of the 
DNA. Load 50 ng and 100 ng (based on the OD reading) and compare it with known 
and good quality DNA. Undigested lambda DNA and a lambda-Hindlll-digested 
DNA are good molecular weight makers. 

Protocol for Digestion of Genomic DNA 

Protocol : 

1 . The relative amounts of DNA for different crop plants that provide approximately a 
balanced number of genome equivalent is given in Table 3. Note that due to the size 
of the wheat genome, wheat DNA will be underrepresented. Lambda DNA provides 
a useful control for complete digestion. 

2. Precipitate the DNA by adding 3 volumes of 100% ethanol. Incubate at -20°C for at 
least two hours. Yeast DNA can be purchased and made up at the necessary 
concentration, therefore no precipitation is necessary for yeast DNA. 

3. Centrifuge the solution at 1 1,400 x g for 20 min. Decant the ethanol carefully (be 
careful not to disturb the pellet). Be sure that the residual ethanol is completely 
removed either by vacuum desiccation or by carefully wiping the sides of the tubes 
with a clean tissue. 

4. Resuspend the pellet in an appropriate volume of water. Be sure the pellet is fully 
resuspended before proceeding to the next step. This may take about 30 min. 

5. Add the appropriate volume of 10X reaction buffer provided by the manufacturer of 
the restriction enzyme to the resuspended DNA followed by the appropriate volume 
of enzymes. Be sure to mix it properly by slowly swirling the tubes. 
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6. Set-up the lambda digestion-control for each DNA that you are digesting. 

7. Incubate both the experimental and lambda digests overnight at 37°C. Spin down 
condensation in a microfuge before proceeding. 

8. After digestion, add 2 yd of loading dye (typically 0.25% bromophenol blue, 0.25% 
xylene cyanol in 15% Ficoll or 30% glycerol) to the lambda-control digests and load 
in 1% TPE-agarose gel (TPE is 90 mM Tris-phosphate, 2 mM EDTA, pH 8). If the 
lambda DNA in the lambda control digests are completely digested, proceed with the 
precipitation of the genomic DNA in the digests. 

9. Precipitate the digested DNA by adding 3 volumes of 100% ethanol and incubating in 
-20°C for at least 2 hours (preferably overnight). 

EXCEPTION: Arabidopsis and yeast DNA are digested in an appropriate volume; 
they don't have to be precipitated. 

10. Resuspend the DNA in an appropriate volume of TE (e.g., 22 \x\ x 50 blots = 1 100 \x\) 
and an appropriate volume of 10X loading dye (e.g., 2.4 ^il x 50 blots = 120 jil). Be 
careful in pipetting the loading dye - it is viscous. Be sure you are pipetting the 
correct volume. 

Table 3 



Some guide points in digesting genomic DNA. 



Species 


Genome 
Size 


Size Relative to 
Arabidopsis 


Genome 
Equivalent to 2 
fig Arabidopsis 
DNA 


Amount 
of DNA 
per blot 


Arabidopsis 


120 Mb 


IX 


IX 


2 \ig 


Brassica 


1,100 Mb 


9.2X 


0.54X 


10 |ag 


Corn 


2,800 Mb 


23.3X 


0.43X 


20 ng 
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Cotton 


2,300 Mb 


19.2X 


0.52X 


20 iig 


Oat 


11,300 Mb 


94X 


0.1 IX 


20 ng 


Rice 


400 Mb 


3.3X 


0.75X 


5 Hg 


Soybean 


1,100 Mb 


9.2X 


0.54X 


10 jig 


Sugarbeet 


758 Mb 


6.3X 


0.8X 


10 jig 


Sweetclover 


1,100 Mb 


9.2X 


0.54X 


10 ng 


Wheat 


16,000 Mb 


133X 


0.08X 


20 ng 


Yeast 


15 Mb 


0.12X 


IX 


0.25 iig 



Protocol for Southern Blot Analysis 

The digested DNA samples are electrophoresed in 1% agarose gels in lx TPE buffer. 
Low voltage; overnight separations are preferred. The gels are stained with EtBr and 
5 photographed. 



1 . For blotting the gels, first incubate the gel in 0.25 N HC1 (with gentle shaking) for 
about 1 5 min. 



2. Then briefly rinse with water. The DNA is denatured by 2 incubations. Incubate 
1 0 (with shaking) in 0.5 M NaOH in 1 .5 M NaCl for 1 5 min. 

3. The gel is then briefly rinsed in water and neutralized by incubating twice (with 
shaking) in 1.5 M Tris pH 7.5 in 1.5 M NaCl for 15 min. 

4. A nylon membrane is prepared by soaking it in water for at least 5 min 5 then in 6X 
SSC for at least 15 min. before use. (20x SSC is 175.3 g NaCl, 88.2 g sodium citrate 

1 5 per liter, adjusted to pH 7.0.) 

5. The nylon membrane is placed on top of the gel and all bubbles in between are 
removed. The DNA is blotted from the gel to the membrane using an absorbent 
medium, such as paper toweling and 6x SCC buffer. After the transfer, the membrane 
may be lightly brushed with a gloved hand to remove any agarose sticking to the 

20 surface. 
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6. The DNA is then fixed to the membrane by UV crosslinking and baking at 80 C. The 
membrane is stored at 4°C until use. 

B. Protocol for PCR Amplification of Genomic Fragments in Arabidopsis 
Amplification procedures : 



1 . Mix the following in a 0.20 ml PCR tube or 96-well PCR plate: 



Volume 


Stock 


Final Amount or Cone. 


0.5 Ml 


~ 10 ng/ul genomic DNA 1 


5 ng 


2.5 nl 


10X PCR buffer 


20 mM Tris, 50 mM KC1 


0.75 |4.1 


50 mM MgCl 2 


1.5 mM 


1 Ml 


10 pmol/ul Primer 1 (Forward) 


10 pmol 


1 ul 


1 0 pmol/ul Primer 2 (Reverse) 


1 0 pmol 


0.5 ul 


5 mM dNTPs 


0.1 mM 


0.1 ul 


5 units/ul Platinum Taq™ (Life 
Technologies, Gaithersburg, MD) 
DNA Polymerase 


1 units 


(to 25 nl) 


Water 





2. The template DNA is amplified using a Perkin Elmer 9700 PCR machine: 



1 Arabidopsis DNA is used in the present experiment, but the procedure is a general one. 
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1 ) 94°C for 1 0 min. followed by 



2} 




3} 




4} 


5 cycles: 




5 cycles: 




25 cycles: 


94 °C- 


30 sec 


94 °C- 


30 sec 


94 °C - 30 sec 


62 °C- 


30 sec 


58 °C- 


30 sec 


53 °C- 30 sec 


72 °C- 


3 min 


72 °C- 


3 min 


72 °C- 3 min 



5) 72°C for 7 min. Then the reactions are stopped by chilling to 4°C 



The procedure can be adapted to a multi-well format if necessary. 
Quantification and Dilution of PCR Products: 

1 . The product of the PCR is analyzed by electrophoresis in a 1% agarose gel. A 
linearized plasmid DNA can be used as a quantification standard (usually at 50, 100, 
200, and 400 ng). These will be used as references to approximate the amount of 
PCR products. Hindlll-digested Lambda DNA is useful as a molecular weight 
marker. The gel can be run fairly quickly; e.g., at 100 volts. The standard gel is 
examined to determine that the size of the PCR products is consistent with the 
expected size and if there are significant extra bands or smeary products in the PCR 
reactions. 

2. The amounts of PCR products can be estimated on the basis of the plasmid standard. 

3. For the small number of reactions that produce extraneous bands, a small amount of 
DNA from bands with the correct size can be isolated by dipping a sterile 10-(al tip 
into the band while viewing though a UV Transilluminator. The small amount of 
agarose gel (with the DNA fragment) is used in the labeling reaction. 

C. Protocol for PCR-DIG-Labeling of DNA 

Solutions: 
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Reagents in PCR reactions (diluted PCR products, 10X PCR Buffer, 50 mM MgCl 2 , 5 
U/jal Platinum Taq Polymerase, and the primers) 

10X dNTP + DIG-1 1-dUTP [1:5]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1 .65 
mM dTTP, 0.35 mM DIG-1 1-dUTP) 

10X dNTP + DIG-1 1-dUTP [1:10]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1.81 
mM dTTP, 0.19 mM DIG-1 1-dUTP) 

10X dNTP + DIG-1 1-dUTP [1:15]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1.875 
mM dTTP, 0.125 mM DIG-1 1-dUTP) 

TE buffer (10 mM Tris, 1 mM EDTA, pH 8) 

Maleate buffer: In 700 ml of deionized distilled water, dissolve 1 1.61 g maleic acid 
and 8.77 g NaCl. Add NaOH to adjust the pH to 7.5. Bring the volume to 1 L. Stir 
for 1 5 min. and sterilize. 

10% blocking solution: In 80 ml deionized distilled water, dissolve 1.1 6g maleic 
acid. Next, add NaOH to adjust the pH to 7.5. Add 10 g of the blocking reagent 
powder (Boehringer Mannheim, Indianapolis, IN, Cat. no. 1096176). Heat to 60 C 
while stirring to dissolve the powder. Adjust the volume to 100 ml with water. Stir 
and sterilize. 

1% blocking solution: Dilute the 10% stock to 1% using the maleate buffer. 

Buffer 3 (100 mM Tris, 100 mM NaCl, 50 mM MgCl 2? pH9.5). Prepared from 
autoclaved solutions of 1M Tris pH 9.5, 5 M NaCl, and 1 M MgCl 2 in autoclaved 
distilled water. 
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Procedure : 

1 . PCR reactions are performed in 25 ul volumes containing: 



10 



PCR buffer 
MgCl 2 

10X dNTP + DIG-1 1-dUTP 
Platinum Taq™ Polymerase 
10 pg probe DNA 
10 pmol primer 1 



Note: 



1 OX dNTP + DIG-1 1 -dUTP (1:5) 



1 OX dNTP + DIG- 1 1 -dUTP (1:10) 
1 OX dNTP + DIG- 1 1 -dUTP (1:15) 



IX 

1.5 mM 

IX (please see the note below) 
1 unit 



Use for : 
< 1 kb 



1 kbto 1.8 kb 
> 1.8 kb 



2. The PCR reaction uses the following amplification cycles: 
1) 94°C for lOmin. 



2) 




3} 




4) 


5 cycles: 




5 cycles: 




25 cycles: 


95°C - 


30 sec 


95°C - 


30 sec 


95°C -30 sec 


61°C - 


1 min 


59°C - 


1 min 


51°C - 1 min 


73°C - 


5 min 


75°C - 


5 min 


73°C - 5 min 



15 



5) 72°C for 8 min. The reactions are terminated by chilling to 4°C (hold). 

The products are analyzed by electrophoresis- in a 1% agarose gel, comparing to an 
aliquot of the unlabelled probe starting material. 



4. The amount of DIG-labeled probe is determined as follows: 
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Make serial dilutions of the diluted control DNA in dilution buffer (TE: 10 mM Tris 
and 1 mM EDTA, pH 8) as shown in the following table: 



DIG-labeled control 
DNA starting cone. 


Stepwise Dilution 


Final Cone. (Dilution 
Name) 


5 ng/[i\ 


1 jxl in 49 ^1 TE 


100 pg/nl(A) 


100 pg/nl(A) 


25 nl in 25 ^il TE 


50 pg/nl (B) 


50 pg/ul (B) 


25 |4.1 in 25 |a.l TE 


25 pg/^1 (C) 


25 pg/nl (C) 


20 in 30 fil TE 


10 pg/ul (D) 



a. Serial deletions of a DIG-labeled standard DNA ranging from 100 pg to 10 pg 
are spotted onto a positively charged nylon membrane, marking the membrane 
5 lightly with a pencil to identify each dilution. 



b. Serial dilutions (e.g., 1 :50, 1 :2500, 1 : 10,000) of the newly labeled DNA probe 
are spotted. 



c. The membrane is fixed by UV crosslinking. 

d. The membrane is wetted with a small amount of maleate buffer and then 
10 incubated in 1% blocking solution for 15 min at room temp. 



e. The labeled DNA is then detected using alkaline phosphatase conjugated anti- 
DIG antibody (Boehringer Mannheim, Indianapolis, IN, cat. no. 1093274) and 
an NBT substrate according to the manufacture's instruction. 



15 



Spot intensities of the control and experimental dilutions are then compared to 
estimate the concentration of the PCR-DIG-labeled probe. 
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D. Prehybridization and Hybridization of Southern Blots 

Solutions : 

1 00% Formamide purchased from Gibco 

20X SSC (IX = 0.15 M NaCl, 0.015 M Na 3 citrate) 

per L: 175 g NaCl 

87.5 g Na 3 citrate-2H 2 0 

20% Sarkosyl (N-lauroyl-sarcosine) 

20% SDS (sodium dodecyl sulphate) 

10% Blocking Reagent: In 80 ml deionized distilled water, dissolve 1.16 g maleic 

acid. Next, add NaOH to adjust the pH to 7.5. Add 10 g of the blocking reagent 
powder. Heat to 60°C while stirring to dissolve the powder. Adjust the volume 
to 100 ml with water. Stir and sterilize. 



Prehybridization Mix: 



Final 

Concentration 


Components 


Volume 
(per 100 ml) 


Stock 


50% 


Formamide 


50 ml 


100% 


5X 


SSC 


25 ml 


20X 


0.1% 


Sarkosyl 


0.5 ml 


20% 


0.02% 


SDS 


0.1 ml 


20% 


2% 


Blocking Reagent 


20 ml 


10% 




Water 


4.4 ml 





General Procedures : 

1 . Place the blot in a heat-sealable plastic bag and add an appropriate volume of 

prehybridization solution (30 ml/1 00cm 2 ) at room temperature. Seal the bag with a 
heat sealer, avoiding bubbles as much as possible. Lay down the bags in a large 
plastic tray (one tray can accommodate at least 4-5 bags). Ensure that the bags are 
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lying flat in the tray so that the prehybridization solution is evenly distributed 
throughout the bag. Incubate the blot for at least 2 hours with gentle agitation using a 
waver shaker. 

2. Denature DIG-labeled DNA probe by incubating for 10 min. at 98°C using the PCR 
machine and immediately cool it to 4°C. 

3. Add probe to prehybridization solution (25 ng/ml; 30 ml = 750 ng total probe) and 
mix well but avoid foaming. Bubbles may lead to background. 

4. Pour off the prehybridization solution from the hybridization bags and add new 
prehybridization and probe solution mixture to the bags containing the membrane. 

5. Incubate with gentle agitation for at least 16 hours. 

6. Proceed to medium stringency post-hybridization wash: 

Three times for 20 min. each with gentle agitation using IX SSC, 1% SDS at 60°C. 

All wash solutions must be prewarmed to 60°C. Use about 100 ml of wash solution 
per membrane. 

To avoid background keep the membranes fully submerged to avoid drying in spots; 
agitate sufficiently to avoid having membranes stick to one another. 

7. After the wash 5 proceed to immunological detection and CSPD development. 

E. Procedure for Immunological Detection with CSPD 

Solutions: 



Buffer 1: 



Maleic acid buffer (0.1 M maleic acid, 0.15 M NaCl; 
adjusted to pH 7.5 with NaoH) 
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Maleic acid buffer with 0.3% (v/v) Tween 20. 



Blocking stock solution 



10% blocking reagent in buffer 1. Dissolve (10X 
concentration): blocking reagent powder (Boehringer 
Mannheim, Indianapolis, IN, cat. no. 1096176) by 
constantly stirring on a 65°C heating block or heat in a 
microwave, autoclave and store at 4°C. 



Buffer 2 

(IX blocking solution): 



Dilute the stock solution 1 : 10 in Buffer 1 . 



Detection buffer: 



0.1 M Tris, 0.1 MNaCl, pH 9.5 



10 Procedure: 



1 . After the post-hybridization wash the blots are briefly rinsed (1-5 min.) in the maleate 
washing buffer with gentle shaking. 

2. Then the membranes are incubated for 30 min. in Buffer 2 with gentle shaking. 



15 



Anti-DIG-AP conjugate (Boehringer Mannheim, Indianapolis, IN, cat. no. 1093274) 
at 75 mU/ml (1 : 10,000) in Buffer 2 is used for detection. 75 ml of solution can be 
used for 3 blots. 



4. The membrane is incubated for 30 min. in the antibody solution with gentle shaking. 



5. The membrane are washed twice in washing buffer with gentle shaking. About 250 
mis is used per wash for 3 blots. 
2 0 6. The blots are equilibrated for 2-5 min in 60 ml detection buffer. 



7. 



Dilute CSPD (1 :200) in detection buffer. (This can be prepared ahead of time and 
stored in the dark at 4°C). 



Reference No. 2750-942P 



888 

The following steps must be done individually. Bags (one for detection and one for 
exposure) are generally cut and ready before doing the following steps. 

8. The blot is carefully removed from the detection buffer and excess liquid removed 
without drying the membrane. The blot is immediately placed in a bag and 1 .5 ml of 
CSPD solution is added. The CSPD solution can be spread over the membrane. 
Bubbles present at the edge and on the surface of the blot are typically removed by 
gentle rubbing. The membrane is incubated for 5 min. in CSPD solution. 

9. Excess liquid is removed and the membrane is blotted briefly (DNA side up) on 
Whatman 3 MM paper. Do not let the membrane dry completely. 

10. Seal the damp membrane in a hybridization bag and incubate for 10 min at 37 C to 
enhance the luminescent reaction. 

1 1 . Expose for 2 hours at room temperature to X-ray film. Multiple exposures can be 
taken. Luminescence continues for at least 24 hours and signal intensity increases 
during the first hours. 

Example 3: Transformation of Carrot Cells 

Transformation of plant cells can be accomplished by a number of methods, as 
described above. Similarly, a number of plant genera can be regenerated from tissue culture 
following transformation. Transformation and regeneration of carrot cells as described herein 
is illustrative. 

Single cell suspension cultures of carrot (Daucus carota) cells are established from 
hypocotyls of cultivar Early Nantes in B 5 growth medium (O.L. Gamborg et al. ? Plant 
Physiol 45:372 (1970)) plus 2,4-D and 15 mM CaCl 2 (B 5 -44 medium) by methods known in 
the art. The suspension cultures are subcultured by adding 10 ml of the suspension culture to 
40 ml of B 5 -44 medium in 250 ml flasks every 7 days and are maintained in a shaker at 150 
rpm at 27 °C in the dark. 

The suspension culture cells are transformed with exogenous DNA as described by Z. 
Chen et al. Plant Mol. Bio. 36:163 (1998). Briefly, 4-days post-subculture cells are incubated 
with cell wall digestion solution containing 0.4 M sorbitol, 2% driselase, 5mM MES (2-[N- 
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Morpholino] ethanesulfonic acid) pH 5.0 for 5 hours. The digested cells are pelleted gently 
at 60 xg for 5 min. and washed twice in W5 solution containing 154 mM NaCl 5 5 mM KC1, 
125 mM CaCl2 and 5mM glucose, pH 6.0. The protoplasts are suspended in MC solution 
containing 5 mM MES, 20 mM CaCl 2 , 0.5 M mannitol, pH 5.7 and the protoplast density is 
5 adjusted to about 4 x 10 6 protoplasts per ml. 

15-60 jag of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting 
suspension is mixed with 40% polyethylene glycol (MW 8000, PEG 8000), by gentle 
inversion a few times at room temperature for 5 to 25 min. Protoplast culture medium known 
in the art is added into the PEG-DNA-protoplast mixture. Protoplasts are incubated in the 

1 0 culture medium for 24 hour to 5 days and cell extracts can be used for assay of transient 

expression of the introduced gene. Alternatively, transformed cells can be used to produce 
transgenic callus, which in turn can be used to produce transgenic plants, by methods known 
in the art. See, for example, Nomura and Komamine, Pit, Phys. 79:988-991 (1985), 
Identification and Isolation of Single Cells that Produce Somatic Embryos in Carrot 

1 5 Suspension Cultures. 

An additional deposit, PTA-141 1, of an E. coli Library, E. co//LibA021800, was 
made at the American Type Culture Collection in Manassas, Virginia, USA on February 22, 
2000 to meet the requirements of Budapest Treaty for the international recognition of the 
deposit of microorganisms. This deposit was assigned ATCC accession no. PTA-141 1. 

2 0 The invention being thus described, it will be apparent to one of ordinary skill in the 

art that various modifications of the materials and methods for practicing the invention can be 
made. Such modifications are to be considered within the scope of the invention as defined 
by the following claims. 

Each of the references from the patent and periodical literature cited herein is hereby 

2 5 expressly incorporated in its entirety by such citation. 
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59446 


59412 


Intr 


59994 


59535 


Intr 


60270 


60075 


Init 


60799 


60608 



>2244747 

len = 

Init 
Intr 
Term 

>2244747 

len = 

Term 
Intr 
Intr 
Init 

>2244747 

len — 

Term 
Intr 
Init 

>2244747 

len - 

Term 
Intr 
Init 

>2244788 

len = 

Init 
Term 

>2244788 

len = 

Term 
Intr 
Init 



/34967 

1692 nex = 

78644 78978 
79811 79967 
80055 80335 

/29662 

2324 nex = 

6181 5707 

6376 6275 

6858 6468 

8030 7268 

/10852 

948 nex - 

95484 95087 
95756 95563 
96034 95845 

/33554 

1225 nex = 

95484 94981 
95756 95563 
96205 95845 

/33860 

894 nex = 

119066 119340 
119433 119959 

/4232 

1570 nex = 

11837 11610 
12997 12874 
13171 13086 



>2244788 



/20129 



len = 



1736 nex = 



Reference No. 2750-942P 



Init 134496 134633 

Intr 134785 134908 

Intr 135250 135306 

Term 135918 136231 

>2244788 /4905 

len = 1532 nex = 

Init 134547 134633 

Intr 134785 134908 

Intr 135250 135306 

Term 135918 136078 

>2244788 /18255 

len = 1917 nex = 

Term 11837 11553 

Intr 12997 12874 

Intr 13171 13086 

Init 13469 13401 

>2244788 /42223 

len = 1270 nex = 



Init 
Term 

>2244788 

len = 

Term 
Init 

>2244788 



141770 141970 
142713 143034 

/21908 

865 nex — 

172609 172540 
173404 172806 

/95834 



len = 932 nex = 

Init 176283 176507 

Intr 176602 176703 

Intr 176785 176939 

Term 176951 177214 

>2244788 /31495 

len = 1150 nex = 

Init 177820 177887 

Intr 178110 178208 

Intr 178295 178347 

Intr 178445 178518 

Term 178797 178969 

>2244788 /40073 

len = 1761 nex = 



Reference No. 2750-942P 



Term 
Intr 
Intr 
Intr 
5 Init 

>2244788 

len = 

10 

Term 
Intr 
Intr 
Intr 

15 Intr 
Intr 
Init 

>2244788 

20 

len = 

Term 
Intr 

25 Intr 
Init 

>2244788 

30 len - 

Term 
Intr 
Intr 

35 Intr 
Intr 
Intr 
Init 

40 >2244788 
len = 
Term 

45 Intr 
Intr 
Init 

>2244788 

50 

len = 

Term 
Intr 

55 Intr 
Intr 
Intr 
Intr 
Intr 

60 Intr 



182960 182681 

183144 183074 

183352 183228 

183544 183430 

184441 183731 

/2738 

1855 nex = 

182960 182701 

183144 183074 

183352 183228 

183544 183430 

183825 183731 

184012 183901 

184555 184343 

/18153 

1337 nex = 

744 549 

903 829 

1232 1053 

1885 1804 

/16319 

1732 nex = 

188526 188214 

188710 188640 

188914 188790 

189112 188998 

189340 189246 

189532 189421 

189945 189850 

/34477 

7 90 nex = 

26188 26035 

26496 26276 

26702 26590 

26822 26779 

/37809 

2215 nex = 

29960 29503 

30139 30054 

30309 30235 

30490 30388 

30687 30606 

30881 30790 

31057 30969 

31236 31156 



918 

0 
0 
0 
0 
0 



7 

0 
0 
0 
0 
0 
0 
0 



4 

0 
0 
0 
0 



7 

0 
0 
0 
0 
0 
0 
0 



4 

0 
0 
0 
0 



10 

0 
0 
0 
0 
0 
0 
0 
0 



Reference No. 2750-942P 



919 





Intr 


31450 


31336 


- 


0 




Init 


31717 


31579 


- 


0 




>2244788 


/9870 






5 














len = 


1700 


nex = 








Term 


45280 


45046 




o 




Intr 


45431 


45380 


- 


0 


1 0 


J. 1 1 L. J. 


45545 


45518 




o 




Intr 


46149 


46080 




0 




Intr 


46413 


46313 


- 


0 




Init 


46745 


46519 


- 


0 


15 


>2244788 


/40736 








len = 


1713 


nex — 


5 






Init 


57948 


58133 


+ 


0 


20 


Intr 


58560 


58765 


+ 


o 




Intr 


58850 


58930 


+ 


0 




Intr 


59012 


59174 




0 




Term 


59262 


59660 


+ 


0 


25 


>2244788 


/1718 








len = 


1844 


nex = 


5 






Term 


60276 


59985 


- 


0 




lilLi 


60467 


60369 




o 




Intr 


60644 


60555 




0 




Intr 


60856 


60742 


_ 


0 




Init 


61828 


61672 


_ 


0 


35 


>2244788 


/94503 








len = 


1930 


nex - 


5 






Term 


60276 


59949 


- 


0 


4 0 


JL 1 1 L. -L. 


60467 


60369 




o 




Intr 


60644 


60555 


- 


0 




Intr 


60856 


60742 


- 


0 




Init 


61875 


61672 


— 


0 


45 


>2244788 


728978 








len = 


921 


nex = 


1 






Sngl 


63706 


62786 


- 


0 


so 














>2244788 


736844 








len = 


1309 


nex - 


1 




55 


Sngl 


78815 


80123 


+ 


0 




>2244788 


742933 








len = 


2960 


nex = 


6 





60 



Reference No. 2750-942P 



920 





Ini t 


92232 


92765 


+ 




Intr 


92959 


93121 


+ 




I nt r 


93567 


93743 






Intr 


93831 


93914 


+ 


5 


Intr 


94438 


94519 






Term 


94602 


95191 


+ 




>2244829 


/38042 




10 


len = 


2717 


nex — 


9 




Init 


103735 


104049 


+ 




Intr 


104329 


104423 






Intr 


104545 


104609 


+ 


15 


Intr 


104833 


104876 


+ 




Intr 


105212 


105295 


+ 




Intr 


105486 


105639 


+ 




Intr 


105738 


105920 


+ 




XIII — L 


106013 


106069 




20 


Term 


106159 


106451 


+ 




>2244829 


/293 






len = 


315 


nex - 


1 


25 














114012 


113698 








/40074 






len := 


1498 


nex = 


2 




Term 


115095 


113973 


— 




init 


115470 


115294 








/38411 






len = 


1796 


nex - 


2 




Term 


115095 


113698 


- 


40 


Init 


115493 


115294 






>2244829 


/10518 






len = 


2190 


nex = 


8 


45 












j_ n i l. 


116378 


116531 






Intr 


116787 


116872 


+ 




T n +- r 
lilLI 


116953 


117024 


+ 




Intr 


117143 


117180 




50 


Intr 


117526 


117569 


+ 




Intr 


117791 


117837 


+ 




Intr 


117992 


118166 






Term 


118269 


118567 


+ 


55 


>2244829 


729288 






len = 


492 


nex = 


1 



0 
0 
0 
0 
0 
0 
0 
0 



Sngl 131227 130736 

60 



o 



Reference No. 2750-942P 



>2244829 
len = 
5 Sngl 

>2244829 
len = 

10 

Sngl 
>2244829 
15 len - 

Sngl 
>2244829 

20 

len = 
>2244829 
25 len = 

Sngl 
>2244829 

30 

len — 
Sngl 

35 >2244829 
len = 
Sngl 

40 

>2244829 
len = 
4 5 Sngl 
>2244829 
len = 

50 

Init 
Intr 

Term 

55 >2244829 
len - 
Init 

60 Intr 



/24175 
332 nex = 
136899 137230 
/17179 
450 nex = 

136899 137332 
/99S23 

346 nex = 

136900 137245 
/37184 

624 nex = 

/126602 
654 nex = 
136900 137553 
/15384 

627 nex = 
136904 137530 

/26797 

628 nex = 
136904 137531 

/36129 

739 nex = 

199828 200566 

/24266 

1908 nex = 

65354 65621 
65713 65836 
66807 67261 

/31856 

897 nex = 

70117 70500 
70585 70611 



Reference No. 2750-942P 



922 





Term 


70696 71013 


+ 




>2244829 


/30327 




5 


len = 


711 nex = 


1 




Sngl 


82258 82968 


+ 




>2244829 


/33166 




± u 










len - 


650 nex = 


1 




Sngl 


82303 82952 




15 


>2244829 


/42848 






len = 


2473 nex = 


9 




Term 


83367 83062 


- 


Z U 


Int r 


OOCCC O *3 A 1 a 

o OjjD O-54/b 






Intr 


83703 83644 


_ 




Intr 


83890 83811 


- 




Intr 


84071 84020 


_ 




Intr 


o / o n /T 

o4 JUb o4iby 




25 


Intr 


84661 84398 


- 




intr 


Q/1TQQ Q A n A O 






Init 


84996 84887 


- 




>2244829 


/22861 




o u 










len = 


611 nex = 


1 




Sngl 


85902 86512 


+ 


35 


>2244829 


/25333 






len = 


2115 nex = 


3 




Term 


87340 86629 


_ 


40 


Intr 


87618 87443 


- 




Init 


88743 87767 


- 




>2244829 


/117350 




45 


len - 


1760 nex = 


8 




Term 


93545 93422 


- 




Intr 


93819 93710 


_ 




Intr 






50 


Intr 


94168 94094 


- 




Intr 


y 4 J 0 o y4z / O 






Intr 


94573 94469 


- 




Intr 


94861 94740 






Init 


95181 94950 




55 










>2244870 


/2163 






len = 


1517 nex = 


1 


60 


Sngl 


13507 15023 


+ 



Reference No. 2750-942P 



>2244870 
len = 



/15641 
1853 nex = 



Init 


2352 


2569 


Intr 


2668 


2781 


Intr 


2862 


2957 


Intr 


3057 


3099 


Intr 


3174 


3326 


Intr 


3408 


3476 


Term 


3843 


4204 



>2244870 

len = 

Term 
Init 

>2244870 

len = 

Term 
Init 

>2244870 

len = 

Sngl 

>2244870 

len = 

Sngl 

>2244870 

len = 

Sngl 

>2244901 

len = 

Sngl 

>2244901 

len = 

Init 
Term 



/35290 

1090 nex = 

33366 33045 
34113 33943 

/18642 

867 nex = 

4431 4071 
4937 4513 

/30852 

513 nex = 

70945 70433 

/36205 

1210 nex = 

71644 70435 

/30929 

8 67 nex = 

84563 85414 

/32219 

644 nex = 

100297 100940 

/101301 

1235 nex = 

12251 12597 
13371 13485 



>2244901 



/15334 



Reference No. 2750-942P 



924 





1 en = 


2089 


nex = 


4 






Init 


12251 


12597 




o 




Intr 


13371 


13484 


+ 


0 


5 


Intr 


13678 


13835 


+ 


0 




Term 


13944 


14339 


+ 


o 




^ it. 1 1 yUl 


/14485 








1 en — 


1048 


nex = 


2 






Term 


136645 


136202 


— 


0 




T n i t* 


137249 


136976 




o 


15 


■^^.^.'±1 ^ J, 


/8916 








len = 


761 


nex — 


2 






Init 


146636 


146871 


+ 


0 


20 


Term 


146912 


147396 


+ 


0 




>2244901 


/22637 








len = 


1930 


nex = 


7 


















Init 


150934 


151112 


+ 


0 




Intr 


151807 


151845 


+ 


0 




Intr 


151938 


151991 


+ 


0 




T r-t +- -K- 


152091 


152144 




u 


30 


Intr 


152269 


152322 




0 




j_ mi. r 


152417 


152488 








Term 


152622 


152862 


+ 


0 




>2244901 


75455 




















len = 


550 


nex = 


1 






Sngl 


153514 


154059 


+ 


0 


40 


>2244901 


/25390 








len = 


1731 


nex = 


3 






Term 


156239 


156216 


_ 


0 


45 


Intr 


156385 


156332 




0 




Init 


157099 


156997 


- 


0 




>2244901 


/39757 






50 


len = 


1489 


nex = 


5 






Term 


164193 


163773 




0 




Intr 


164487 


164293 




0 




Intr 


164750 


164603 




0 


55 


Intr 


164938 


164832 




0 




Init 


165261 


165017 




0 



>2244901 



/113295 



60 len = 



250 nex = 



1 



Reference No. 2750-942P 



925 





Sngl 


165261 


165021 




0 




>2244901 


/43007 






5 














len = 


3418 


nex = 


9 






Init 


181307 


182180 




u 




Intr 


182482 


182558 


+ 


0 


10 


Intr 


182639 


182732 


+ 


u 




Intr 


182817 


182915 


+ 


0 




Intr 


183212 


183301 




U 




Intr 


183400 


183519 


+ 


0 




Intr 


183767 


183870 


+ 


0 


15 


Intr 


184163 


184235 


+ 


U 




Term 


184397 


184724 


+ 


0 




>2244901 


/8381 






20 


len = 


928 


nex = 


2 






Init 


197128 


197392 


+ 


0 




Term 


197699 


198055 




U 


25 


>2244901 


735383 








len = 


1690 


nex = 


1 






Sngl 


23032 


21343 


- 


0 


30 














>2244901 


/12451 








len = 


2050 


nex — 


4 




35 


Init 


29261 


29459 


+ 


0 




Intr 


29681 


29785 


+ 


0 




Intr 


29969 


30397 


+ 


0 




Term 


30959 


31303 


+ 


0 


40 


>2244901 


78234 








len = 


855 


nex = 


4 






Term 


33518 


33296 





0 


45 


Intr 


33802 


33633 


- 


0 




Intr 


34017 


33880 




0 




Init 


34150 


34103 




0 




>2244901 


733073 






50 














len = 


3028 


nex = 


2 






Init 


4164 


4631 


+ 


0 




Term 


6071 


7191 


+ 


0 



55 



60 



>2244901 
len = 
Init 



/307 
1838 nex = 
44565 44888 



Reference No. 2750-942P 



926 





Intr 


44976 


45044 


+ 


0 




Intr 


45145 


45198 


+ 


0 




Intr 


45288 


45327 


+ 


0 




-Lnur 


45414 


45512 


+ 


o 


5 


Intr 


45595 


45819 


+ 


0 




Intr 


45902 


46023 


+ 


0 




Term 


46120 


46402 


+ 


0 




>2244901 


/19122 






10 














len = 


1766 


nex — 


8 








44638 


44888 


+ 


o 




Intr 


44976 


45044 


+ 


0 


1 R 




45145 


45198 




o 




Intr 


45288 


45327 


+ 


0 




Intr 


45414 


45512 


+ 


0 




Intr 


45595 


45819 


+ 


0 




Intr 


45902 


46023 


+ 


0 


20 


Term 


46120 


46403 


-f 


0 




>2244901 


737345 








len = 


1379 


nex - 


3 




Z 3 














Init 


55027 


55308 


+ 


0 




Intr 


55387 


55671 




0 




Term 


55759 


56179 


+ 


0 


30 


>2244901 


/26019 








len = 


1750 


nex = 


3 






Init 


77747 


78039 


+ 


0 


35 


Intr 


78780 


78906 


+ 


0 




Term 


79065 


79492 


4. 

I 


n 

\j 






7933 






a n 

41 u 


len — 


1415 


nex = 








Init 


86075 


86413 


+ 


0 




Term 


86998 


87489 


+ 


0 


45 


>2244950 


712629 








len = 


3346 


nex - 


10 






Term 


100982 


100625 


- 


0 


50 


Intr 


101466 


101106 


- 


0 




Intr 


101718 


101591 




0 




Intr 


102002 


101874 




0 




Intr 


102439 


102360 




0 




Intr 


102690 


102527 




0 


55 


Intr 


102958 


102773 




0 




Intr 


103205 


103074 




0 




Intr 


103432 


103291 




0 




Init 


103970 


103568 




0 



60 >2244950 /40414 



Reference No. 2750-942P 



927 





len = 


2150 


nex = 


6 




Term 


109338 


109067 


- 




Int r 


109551 


109489 






Intr 


109708 


109646 


- 




Intr 


109850 


109803 


- 




Intr 


110001 


109939 


- 




Init 


111043 


110961 


- 


10 












>2244950 


/30227 






len = 


2050 


nex — 


6 


15 


Term 


109338 


109187 


- 




Intr 


109551 


109489 


- 




Intr 


109708 


109646 


- 




Intr 


109850 


109803 


- 




Intr 


110001 


109939 


- 


20 


Init 


111043 


110961 


- 




>2244950 


/5714 






len = 


1403 


nex = 


7 


25 












Init 


124186 


124326 


+ 




Intr 


124418 


124469 


+ 




X 11 C L 


124596 


124670 


+ 




Intr 


124766 


124794 


+ 


30 


Intr 


124968 


125001 


+ 




Intr 


125082 


125152 


+ 




Term 


125251 


125588 


+ 




>2244950 


/33513 
















len = 


1593 


nex - 


4 




Init 


138127 


138644 


+ 




Intr 


138739 


138858 


+ 


a n 


± nt r 


138934 


139180 






Term 


139256 


139719 


+ 




>2244950 


/19028 




45 


len - 


638 


nex = 


2 




Init 


139024 


139180 


+ 




1 t; I. Ill 


139256 


139661 




50 


>2244950 


/21894 






len = 


1030 


nex - 


1 




Sngl 


146832 


145803 




55 












>2244950 


/7605 






len = 


814 


nex = 


2 



0 
0 
0 
0 
0 
0 



0 
0 
0 
0 
0 
0 
0 



0 
0 
0 
0 



0 
0 



60 Term 167332 166714 



0 



Reference No. 2750-942P 



Init 167527 167451 

>2244950 /3176 

len = 1423 nex = 

Term 167332 166764 

Init 167934 167451 

>2244950 /41791 

len = 1479 nex = 

Term 167332 166712 

Intr 167934 167451 

Init 168190 168116 

>2244950 /12256 

len = 1716 nex = 

Term 169269 169015 

Intr 169606 169448 

Intr 170335 170260 

Init 170730 170607 

>2244950 /6723 

len = 1536 nex = 

Init 171676 171958 

Intr 172224 172415 

Intr 172496 172661 

Term 172740 173211 

>2244950 /124835 

len = 978 nex = 

Sngl 18831 19808 

>2244950 /40793 

len = 1247 nex = 

Term 193189 192906 

Intr 193587 193266 

Init 194152 193673 

>2244950 /2803 

len = 1824 nex = 

Init 2896 3184 

Intr 3571 3676 

Term 4403 4719 

>2244950 /9209 

len = 573 nex = 



Reference No. 2750-942P 



929 





Sngl 


31137 


30565 








>2244950 


/2965S 






5 














len — 


682 


nex = 


1 






Sngl 


34486 


35167 


+ 


0 


10 


>2244950 


/40913 








len = 


2079 


nex = 


7 






Init 


4949 


5128 


+ 


U 


15 


Intr 


5254 


5419 


+ 


0 




Intr 


5498 


5550 


+ 


0 




Intr 


5911 


5973 


+ 


0 




Intr 


6366 


6416 


+ 


0 




Intr 


6516 


6630 


+ 


0 


20 


Term 


6687 


7027 


+ 


0 




>2244950 


/18234 








len = 


1950 


nex = 


6 




25 














Init 


61059 


61335 


+ 


0 




Intr 


61420 


61550 


+ 


0 




Intr 


61714 


61791 


+ 


0 




Intr 


61882 


61926 


+ 


0 


30 


Intr 


62016 


62060 


+ 


0 




Term 


62293 


62389 


+ 


0 




>2244950 


/32203 






35 


len = 


1510 


nex = 


6 






Init 


7376 


7454 


+ 


0 




Intr 


7542 


7577 


+ 


u 




Intr 


7707 


7844 


+ 


0 


40 


Intr 


7939 


8012 


+ 


u 




Intr 


8418 


8486 


+ 


0 




Term 


8556 


8884 


+ 


pi 

u 




>2244950 


/31782 






45 














len = 


1211 


nex - 


1 






Sngl 


84183 


82973 




0 


50 


>2244950 


/17019 








len = 


2897 


nex = 


2 






Term 


84672 


82981 




c 


55 


Init 


85877 


85235 




c 



>2244950 



/109560 



len = 

60 



397 nex = 



1 



Reference No. 2750-942P 



930 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Sngl 


95604 


96000 


+ 


0 


>2244991 


/7101 






len = 


1300 


nex — 


5 




Term 


99473 


99160 


- 


0 


Int r 


99674 


99597 




n 

\J 


Intr 


99851 


99788 


- 


0 


-r t~i 4- -v~ 

intir 


100015 


99939 




o 


Init 


100216 


100170 


- 


0 


>2244991 


/14136 






len = 


1251 


nex = 


1 




Sngl 


133001 


131751 


- 


0 


>2244991 


/24611 






len = 


1275 


nex = 


o 




Init 


144816 


144916 


+ 


o 


Intr 


144996 


145065 


+ 


0 


Intr 


145153 


145209 


\ 


n 

u 


Intr 


145299 


145360 


+ 


0 


Intr 


145408 


145507 


+ 


0 


Term 


145593 


145964 


+ 


0 


>2244991 


/5546 






len = 


1163 


nex = 


3 




Term 


157187 


156808 


- 


0 


Intr 


157430 


157305 




o 


Init 


157970 


157545 


- 


0 


>2244991 


/8212 






len = 


1254 


nex = 


0 




>2244991 


/40778 






len = 


879 


nex = 


3 




Init 


163368 


163492 


+ 


0 


Intr 


163658 


163757 


+ 


0 


Term 


163863 


164240 


+ 


0 


>2244991 


/23771 






len = 


1377 


nex = 


4 




Term 


164902 


164507 




0 


Intr 


165186 


164989 




0 


Intr 


165666 


165500 




0 


Init 


165883 


165813 




0 



>2244991 

60 



/16525 



Reference No. 2750-942P 



931 





len = 


810 


nex = 


2 






Init 


172277 


172503 


+ 


0 




Term 


172604 


173086 


+ 


0 


5 














>2244991 


/22084 








len = 


1450 


nex = 


2 




10 


Init 


177203 


177333 




U 




Term 


177407 


177827 


+ 


0 




>2244991 


/157870 






15 


len = 


342 


nex = 


1 






Sngl 


17882 


17541 


_ 


0 




>224 4991 


/5686 






20 














]_en = 


2453 


nex = 


10 






Term 


194540 


194396 


_ 


0 




Intr 


194759 


194680 


- 


0 


25 


Intr 


194888 


194843 




0 




Intr 


195027 


194971 




0 




Intr 


195163 


195105 


- 


0 




Intr 


195344 


195244 


- 


0 




Intr 


195623 


195502 




0 


30 


Intr 


195980 


195929 


- 


0 




Intr 


196138 


196058 




u 




Init 


196848 


196213 


- 


0 




>2244991 


/2505 






35 














len ~ 


623 


nex - 


1 






Sngl 


27093 


26471 


- 


0 


40 


>2244991 


/7632 








len = 


1210 


nex = 


3 






Term 


36794 


36385 




0 


45 


Intr 


37205 


37073 




0 




Init 


37590 


37308 




0 




>2244991 


/30471 






50 


len = 


1883 


nex — 


6 





55 



Term 


39363 


38946 




Intr 


39486 


39437 




Intr 


39651 


39570 




Intr 


39806 


39736 




Intr 


40168 


40098 




Init 


40371 


40292 





0 
0 
0 
0 
0 
0 



>2244991 

60 



/17535 



Reference No. 2750-942P 



len = 


585 nex = 


Sngl 


43288 43872 


>2244991 


/17553 


len = 


628 nex = 


Init 


44575 44786 


Term 


44876 45202 


>2244991 


/l 6090 


len = 


634 nex = 


Init 


44583 44786 


Terra 


44876 45216 


>2244991 


/31946 


len = 


562 nex = 


Sngl 


66524 65963 


>2244991 


/6580 


len — 


509 nex - 


Sngl 


70265 69757 


>2244991 


/17851 


len - 


1752 nex = 


Term 


71484 71210 


Intr 


71754 71636 


Intr 


71898 71846 


Intr 


72484 72429 


Init 


72626 72579 



>2244991 
len = 
Sngl 
>2245031 
len = 
Sngl 
>2245031 
len = 
Sngl 
>2245031 



/92054 
587 nex = 
8564 9150 
/92144 
444 nex - 
125198 125641 
/30087 
822 nex = 
125198 126019 
/118011 



Reference No. 2750-942P 



933 





len = 


355 


nex = 


1 




Sngl 


125287 


125641 


+ 


5 


>2245031 


/91870 






len = 


1970 


nex = 


4 




Init 


144106 


144256 


+ 


10 


Intr 


144641 


144768 


+ 




Intr 


145143 


145253 


+ 




Term 


145583 


146075 


+ 




>2245031 


/36017 




15 












len = 


3647 


nex = 


8 




Term 


154141 


153926 






Intr 


155021 


154948 


- 


20 


Intr 


155252 


155139 






Intr 


155661 


155584 


- 




Intr 


155955 


155829 


- 




Intr 


156204 


156149 


— 




Intr 


156561 


156358 


— 


25 


Init 


157572 


157241 


— 




>2245031 


/7834 






len = 


3010 


nex — 


12 


30 












Init 


157780 


157908 


+ 




Intr 


157993 


158125 


+ 




Intr 


158517 


158604 


+ 




Intr 


158708 


158784 


+ 


35 


Intr 


159068 


159107 






Intr 


159412 


159497 


+ 




Intr 


159590 


159671 


+ 




Intr 


159798 


159854 


+ 




Intr 


159938 


159976 


+ 


40 


Intr 


160067 


160137 


+ 




Intr 


160354 


160407 


+ 




Term 


160554 


160780 






>2245031 


/114540 




45 












len = 


3018 


nex = 


11 




Init 


157780 


157908 


+ 




Intr 


157993 


158125 


+ 


50 


Intr 


158517 


158604 


+ 




Intr 


158708 


158784 


-f 




Intr 


159068 


159497 


+ 




Intr 


159590 


159671 


+ 




Intr 


159798 


159854 


+ 


55 


Intr 


159938 


159976 


+ 




Intr 


160067 


160137 


+ 




Intr 


160354 


160407 


+ 




Term 


160554 


160797 


+ 



60 >2245031 



/110681 



Reference No. 2750-942P 



934 





len = 


466 nex = 


2 






Init 


172709 172801 


+ 


0 


D 


Term 


172906 1731 /4 




n 
u 




>224 5031 










len = 


610 nex = 


1 




10 












Sngl 


173847 173242 


- 


0 




>2245031 


/42533 






15 


len = 


1533 nex = 


4 






Init 


17415 17660 


+ 


0 




Intr 


17764 18062 


+ 


0 




Intr 


18331 18410 


+ 


0 


20 


Term 


18499 18947 


+ 


0 




>2245031 


736882 








len = 


2299 nex = 


c 
D 




25 












Term 


173963 173241 


- 


0 




Intr 


174262 174007 


- 


0 




Intr 


174516 17 4406 




u 




Intr 


174824 174614 




0 


30 


Init 


175539 174923 




u 




>2245031 


/14 613 








len = 


673 nex = 


1 




35 












Sngl 


20501 19829 


- 


0 




>224 5031 


/831 






40 


len = 


850 nex = 


3 






Init 


r\ r\ V A A r\ "1 "1 "1 

39954 40111 




U 




Intr 


40198 40248 


+ 


0 




Term 


40330 40796 


+ 


u 


45 












>2245031 


/14223 








len « 


638 nex = 


i 




r- s~\ 

50 


Sngl 


43095 43370 




u 




>2245031 


/35772 








len = 


1663 nex = 


1 




55 












Sngl 


48986 49948 


+ 


0 




>2245073 


/158661 






60 


len = 


73 9 nex = 


1 





Reference No. 2750-942P 



935 





Sngl 


102245 


101507 




n 
u 




>2245073 


/34167 






5 














len = 


1019 


nex = 


3 






Init 


104868 


105196 


+ 


0 




Intr 


105282 


105361 


+ 


0 


10 


Term 


105463 


105866 




0 




>2245073 


/36603 








len = 


4481 


nex = 


11 




15 














Term 


6893 


6584 


- 


0 




Intr 


7287 


7083 


- 


0 




Intr 


7700 


7618 




u 




Intr 


8129 


7990 


- 


0 


20 


Intr 


8424 


8266 




u 




Intr 


9480 


8479 


- 


0 




Intr 


9839 


9542 




u 




Intr 


10132 


9928 


_ 


0 




Intr 


10433 


10351 


- 


0 


25 


Intr 


10748 


10609 


_ 


0 




Init 


11064 


10945 


- 


0 




>2245073 


737223 






30 


len — 


4483 


nex = 


11 






Term 


6893 


6584 


- 


0 




Intr 


7287 


7083 


- 


0 




Intr 


7700 


7618 




(J 


35 


Intr 


8129 


7990 


_ 


0 




Intr 


8424 


8266 




0 




Intr 


9480 


8479 


- 


0 




Intr 


9839 


9542 




u 




Intr 


10132 


9928 


- 


0 


40 


Intr 


10433 


10351 




U 




Intr 


10748 


10609 


- 


0 




Init 


11066 


10945 




u 




>2245073 


/6042 






45 














len = 


959 


nex = 


1 






Sngl 


124096 


125054 


+ 


0 


50 


>2245073 


/35156 








len = 


2133 


nex = 


7 






Init 


136139 


136418 


+ 


0 


55 


Intr 


136654 


136948 


+ 


0 




Intr 


137036 


137101 


+ 


0 




Intr 


137200 


137329 


+ 


0 




Intr 


137421 


137579 


+ 


0 




Intr 


137703 


137753 




0 


60 


Term 


137855 


138271 


+ 


0 



Reference No. 2750-942P 



936 



>2245073 



/154342 



10 



15 



20 



25 



30 



35 



40 



45 



len - 


111 


nex = 


1 




Sngl 


140364 


140254 


- 


0 


>2245073 


/3258 






len = 


2050 


nex = 


8 




Term 


138586 


138326 


- 


0 


Int r 


138787 


138684 






Intr 


139039 


138884 


- 


0 


Intr 


139188 


139117 


- 


0 


Intr 


139338 


139291 


- 


0 


Intr 


139469 


139422 


- 


0 


Intr 


139680 


139608 


- 


0 


Init 


140370 


140183 








/2161 






len = 


1690 


nex = 


D 




Init 


145051 


145144 


+ 


0 


Intr 


145227 


145544 


T 


n 

u 


Intr 


145712 


145798 


+ 


0 


X -r-y +- -*~ 

intr 


145888 


146021 


_j_ 


o 


Term 


146416 


146733 


+ 


0 


>2245073 


/17120 






len = 


464 


nex = 


2 




Init 


145081 


145144 


+ 


0 


Term 


145227 


145544 




n 
u 


>2245073 


/29150 






len = 


1072 


nex = 


3 




Init 


168520 


168924 


+ 


0 


Intr 


169023 


169160 


+ 


0 


Term 


169230 


169591 


+ 


0 


>2245073 


/23025 






len = 


2715 


nex = 


8 





50 



55 



Init 


181224 


181382 




0 


Intr 


181935 


181992 


+ 


0 


Intr 


182407 


182489 


+ 


0 


Intr 


182789 


183061 


+ 


0 


Intr 


183152 


183204 




0 


Intr 


183325 


183405 


+ 


0 


Intr 


183502 


183614 


+ 


0 


Term 


183704 


183938 




0 



60 



>2245073 



/19505 



Reference No. 2750-942P 



937 



len = 2035 nex = 



Init 189969 190426 + 0 

Intr 190764 190988 + 0 

5 Intr 191116 191225 + 0 

Term 191315 191480 + 0 

>2245073 /31781 

10 len = 1939 nex - 4 

Init 190050 190426 + 0 

Intr 190764 190988 + 0 

Intr 191116 191225 + 0 

15 Term 191315 191480 + 0 

>2245073 /36521 

len = 730 nex = 1 

20 

Sngl 190098 190332 + 0 

>2245073 /39872 

25 len = 2135 nex - 4 

Init 192291 192840 + 0 

Intr 193297 193492 + 0 

Intr 193589 193720 + 0 

30 Term 194093 194425 + 0 

>2245073 /6709 

len = 1058 nex - 2 

35 

Term 198909 198442 - 0 

Init 199499 199146 - 0 

>2245073 /94923 

len - 739 nex = 2 

Init 20607 20828 + 0 

Term 20918 21345 + 0 

>2245073 /24997 

len = 530 nex = 1 

50 Sngl 26357 25828 - 0 

>2245073 /33509 



40 



45 



55 



len = 1450 nex = 2 

Init 38766 39446 + 0 

Term 39638 40214 + 0 



>2245073 

60 



/35260 



Reference No. 2750-942P 



938 





len = 


1557 


nex - 


3 






Term 


43961 


43535 


- 


0 




Intr 


44176 


44048 




n 


5 


Init 


45091 


44398 


- 


0 




>2245073 


/27500 








len = 


1700 


nex = 


1 




10 














Sngl 


51675 


51471 




U 




>2245073 


799796 






15 


len = 


1150 


nex = 


o 
Z 






Init 


64024 


64466 


+ 


0 




Term 


64647 


65171 


+ 


0 


20 


>2245073 


/31538 








len = 


1278 


nex = 


4 






Term 


79423 


79066 




U 


25 


Intr 


79725 


79528 


_ 


0 




Intr 


80213 


80047 


- 


0 




Init 


80343 


80313 




0 




>2245073 


726448 






30 














len — 


2146 


nex = 


8 






Term 


87600 


87509 




U 




Intr 


87818 


87699 


- 


0 


35 


Intr 


88211 


88116 




u 




Intr 


88333 


88295 




0 




Intr 


88636 


88458 


- 


0 




Intr 


88765 


88726 


- 


0 




Intr 


88913 


88854 




0 


40 


Init 


89406 


89167 




0 




>2245126 


739922 








len = 


2134 


nex — 


5 




45 














Term 


28671 


27817 




0 




Intr 


28825 


28745 




0 




Intr 


28988 


28913 




0 




Intr 


29183 


29080 




0 


50 


Init 


29950 


29830 




0 



>2245126 
len = 



/37533 
1873 nex = 



55 



60 



Init 


30483 


30887 


+ 


0 


Intr 


30977 


31070 


+ 


0 


Intr 


31153 


31292 


+ 


0 


Intr 


31365 


31439 


+ 


0 


Intr 


31521 


31678 


+ 


0 



Reference No. 2750-942P 



Intr 31762 31823 

Term 31972 32355 

>2245126 /42815 

len = 1514 nex = 

Init 56618 56988 

Intr 57254 57524 

Intr 57621 57791 

Term 57887 58131 

>2252639 /36439 

len = 2305 nex 

Term 112752 112679 

Intr 112953 112837 

Intr 113158 113042 

Intr 113355 113254 

Intr 113539 113444 

Intr 113704 113623 

Intr 113928 113814 

Intr 114069 114018 

Intr 114227 114147 

Intr 114489 114328 

Intr 114748 114572 

Init 114983 114885 

>2252639 /32628 

len = 8176 nex = 

Term 112752 112549 

Intr 112953 112837 

Intr 113158 113042 

Intr 113355 113254 

Intr 113539 113444 

Intr 113704 113623 

Init 113928 113814 



>2252639 

len = 

Init 
Intr 
Intr 
Intr 
Intr 
Intr 
Intr 
Intr 
Term 

>2252639 



/7870 

2062 nex = 

55275 55373 

55679 55864 

55943 56072 

56168 56248 

56342 56529 

56624 56719 

56822 56915 

57043 57162 

57257 57336 

/42847 



len = 2459 nex = 



Init 



64066 64204 



Reference No. 2750-942P 



Intr 
Term 

>2252639 

5 

len = 
Sngl 

10 >2252639 
len = 
Sngl 

15 

>2252639 
len = 
20 Sngl 

>2252639 
len - 

25 



30 

>2252639 
len = 

35 



40 



>2252639 
45 len - 

Sngl 
>2252639 

50 

len = 

Term 
Intr 

55 Init 
>2252639 



65296 65804 
65895 66271 

/20756 

561 nex = 

66935 66375 

/8355 

619 nex = 

67016 66406 

/104398 

114 nex = 

67655 67768 

/34829 

1550 nex - 



/34276 
2157 nex = 



/11108 

539 nex = 

79342 78804 

/1269 

1433 nex ^ 

79851 79679 
80212 80012 
80700 80396 

/5476 



940 
+ 0 
+ 0 

1 

0 

1 

0 

1 

+ 0 

5 



6 



1 

0 

3 

0 
0 
0 



Term 


72152 


71686 




Intr 


72324 


72213 




Intr 


72574 


72402 




Intr 


72867 


72664 




Init 


73235 


73005 





Term 


76139 


75823 




Intr 


76346 


76218 




Intr 


76530 


76444 




Intr 


76771 


76626 




Intr 


76952 


76898 




Init 


77979 


77037 





len = 

60 



835 nex = 



3 



Reference No. 2750-942P 



Init 85064 85271 

Intr 85376 85455 

Term 85554 85898 

>2252639 /35833 

len = 873 nex = 

Init 85064 85271 

Intr 85376 85455 

Term 85554 85936 



>2252639 

len = 

Init 
Intr 
Term 

>2252639 

len = 

Init 
Intr 
Term 

>2252639 

len = 

Init 
Intr 
Term 

>2252639 

len = 

Term 
Init 

>2252823 

len = 

Sngl 

>2252823 

len = 

Sngl 

>2252823 



/1810 

878 nex = 

85064 85271 
85376 85455 
85554 85941 

/17857 

910 nex = 

85064 85271 
85376 85455 
85554 85972 

/10862 

864 nex = 

85068 85271 
85376 85455 
85554 85931 

/22773 

2008 nex = 

92196 90691 
92698 92411 

/11106 

1289 nex = 

107171 108459 

/25765 

315 nex = 

1671 1357 

/38970 



len *= 2486 nex = 



Sngl 29968 30145 



Reference No. 2750-942P 



>2252823 
len = 

5 

Init 
Intr 
Term 

10 >2252823 
len = 



15 



35 



45 



50 



Init 
Intr 
Term 



>2252823 
20 len = 

Sngl 
>2252823 

25 

len = 
Sngl 

30 >2252823 
len = 



Term 
Intr 
Init 



>2252823 

4 0 len - 

Term 
Intr 
Init 



>2252823 

len = 

Init 
Term 



>2252848 
55 len = 

Sngl 
>2252848 

60 



/15741 

3070 nex = 

29968 30145 
30436 30547 
30642 31104 

/28637 

2900 nex = 

35493 36349 
36852 37326 
37673 38392 

/21038 

495 nex = 

37895 38389 

/35506 

582 nex = 

50035 49454 

/39479 

1604 nex = 

56402 56064 
57185 56486 
57649 57493 

/36326 

2392 nex = 

64455 64054 
64734 64625 
65205 64824 

/31027 

1150 nex = 

94085 94153 
94219 95230 

/111719 

733 nex = 

46064 45332 

/11036 



Reference No. 2750-942P 



len = 

Sngl 

>2252848 

len = 

Sngl 

>2252848 

len - 

Sngl 

>2252848 

len = 

Sngl 

>2252848 

len = 

Init 
Intr 
Term 

>2252848 

len = 

Init 
Intr 
Term 

>2252848 

len = 

Term 
Intr 
Intr 
Init 

>2252848 

len = 

Term 
Intr 
Init 

>2262097 

len = 



790 nex = 

46089 45304 

/3204 

833 nex = 

60597 61429 

/22161 

670 nex = 

63070 63731 

/22348 

74 0 nex = 

65608 64869 

/28082 

1216 nex = 

80915 80991 
81337 81552 
81645 81897 

/26442 

1210 nex = 

80915 80991 
81337 81552 
81645 81895 

/37305 

1575 nex = 

91905 91570 

92168 92002 

92528 92246 

92758 92613 

/37175 

2050 nex = 

95449 94674 
95668 95551 
96720 96101 

/22611 

1439 nex = 



Init 



31 168 



Reference No. 2750-942P 



944 





Intr 


253 


403 


+ 




Intr 


481 


885 


+ 




Term 


969 


1469 


+ 


5 


>2262097 


737663 






len = 


1694 


nex = 


2 




Term 


48814 


47723 


- 


10 


Init 


49413 


49234 


- 




>2262097 


/37704 






1 = 


1990 


nex = 


6 


15 












Term 


4521 


4199 






Intr 


4778 


4665 


- 




Intr 


5379 


5207 






Intr 


5540 


5489 


- 


20 


Intr 


5680 


5632 


_ 




Init 


6186 


5782 


- 




^ Z- w *C J/ / 


/112955 




25 


len — 


2350 


nex = 


4 




Term 


89371 


88825 


- 




Intr 


89563 


89456 


- 




Intr 


89803 


89654 


— 






91172 


90509 






>2262135 


/41490 






len = 


1454 


nex = 


2 


35 












Term 


2318 


1916 


- 




Init 


3369 


2625 


- 






/20167 




40 












len = 


1304 


nex = 


2 




Term 


4241 


3765 


- 




Init 


5068 


4768 


— 














>2262135 


/32291 






len = 


1390 


nex ~ 


3 


50 


Term 


3887 


3685 






Intr 


4241 


4100 






Init 


5072 


4768 






>2262135 


/6568 




55 












len = 


2212 


nex = 


6 



Term 55501 55152 
Intr 55716 55591 
60 Intr 55868 55793 



0 
0 
0 
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945 





Intr 


56088 55950 


- 


0 




Intr 


56564 56483 


— 


0 




Init 


57080 56653 


— 


0 


5 


>2262135 


/10207 








len = 


2063 nex = 


4 






Init 


59951 60024 


+ 


0 


10 


Intr 


60681 60762 


+ 


0 




Intr 


61016 61098 


+ 


0 




Term 


61517 61813 


+ 


0 




>2262135 


/18545 






15 












len = 


647 nex = 


1 






Sngl 


6145 6791 




0 


20 


>2262135 


/4346 








len = 


2939 nex - 


6 






Init 


70603 71150 


+ 


0 


25 


Intr 


71555 71677 


+ 


0 




1I1LJ. 


7184? 71907 


+ 


o 




Intr 


71994 72059 


+ 


0 




Intr 


72734 72814 


+ 


0 




Term 


72893 73541 


+ 


0 














>2262135 


/26127 








len = 


817 nex = 


1 




35 


Sngl 


10051 10199 




0 




>2262135 


/8114 








len = 


1879 nex = 


4 




40 












Tn i t 

J. 1 1 J. u 


97068 97416 




0 




Intr 


98158 98297 




0 




Intr 


98468 98540 


+ 


0 




Term 


98650 98946 




0 


4 5 












>2262135 


/34186 








len = 


347 nex = 


1 




50 


Sngl 


97069 97415 


+ 


0 




>2262135 


/145375 








len - 


319 nex - 


1 




55 












Sngl 


10051 10164 


+ 


0 




>2262135 


/18454 







60 len = 



354 nex = 



1 
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Sngl 


10051 


10199 


>7 7 67 1 IS 


/27915 


1 en = 


1173 


nex = 


Ini t 


99470 


99712 


Intr 


99822 


99870 


± tSX III 


99982 


100642 




/1441 


len = 


657 


nex = 


Sngl 


23119 


22463 


>2262155 


738365 


len = 


2443 


nex = 


Term 


33741 


33609 


Intr 


33874 


33812 




34038 


33961 


Intr 


34207 


34130 


T n t- t 


34357 


34283 


Intr 


34542 


34456 


m L.r 


35004 


34864 


Intr 


35174 


35106 


Intr 


35320 


35254 


Intr 


35536 


35471 


Init 


36051 


35849 


>2262155 


72578 


len = 


2710 


nex = 


Term 


41819 


41536 


Intr 


42007 


41945 


Intr 


42177 


42100 


Intr 


42353 


42276 


Intr 


42507 


42433 


Intr 


42691 


42605 


Intr 


42920 


42792 


Intr 


43144 


43004 


Intr 


43300 


43232 


Intr 


43448 


43382 


Intr 


43690 


43625 


Init 


44238 


44044 



>2262155 

len = 

Init 
Intr 
Intr 
Term 



/10042 
1776 nex = 



47118 
47279 
47575 
47837 



47195 
47459 
47672 
48384 



>2262155 



713246 
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len = 


1990 


nex = 


6 






Init 


54079 


54165 


+ 


0 


D 


Int r 


54255 


54346 




o 




Intr 


54432 


54540 


+ 


0 




Intr 


54640 


54675 


+ 


0 




Intr 


54764 


54850 


+ 


0 




Term 


54940 


55113 


+ 


0 


10 














>2262155 


734698 








len = 


1459 


nex = 


6 




15 


Init 


56211 


56260 


+ 


0 




Intr 


56344 


56556 


+ 


0 




Intr 


56654 


56802 


+ 


0 




Intr 


56878 


57034 


+ 


0 




Intr 


57160 


57252 




o 


20 


Term 


57530 


57669 


+ 


0 




>2262155 


/39211 








len = 


2110 


nex = 


2 




25 














Init 


64477 


65546 




0 




Term 


66273 


66579 


+ 


0 




>2262155 


/19601 






30 














len - 


2050 


nex = 


2 






Init 


64534 


65546 


+ 


0 




Term 


66273 


66579 






35 














>2262155 


/32751 








len — 


850 


nex = 


1 




a n 
4 U 


Sngl 


77445 


76604 




o 




>22 62 lob 


73276 








len = 


1167 


nex = 


1 




4 5 














Sngl 


8628 


9794 


+ 


0 




>2264302 


738370 






50 


len = 


1450 


nex ~ 


2 






Term 


35101 


34004 




0 




Init 


35452 


35188 




0 


55 


>2264302 


79562 








len = 


2074 


nex = 


0 





>2264302 

60 



728046 
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948 





len — 


1581 


nex = 


4 






1 CI- ill 


51719 


51257 




o 




Intr 


52040 


51910 


- 


0 


5 


Intr 


52474 


52402 


- 


0 




Init 


52837 


52724 


- 


0 






/16428 








l6n = 


1571 


nex = 


4 






Term 


51818 


51294 


- 


0 




Intr 


52040 


51910 


- 


0 




Intr 


52474 


52402 


- 


0 


15 


Init 


52864 


52724 


— 


0 




>2264302 


/100085 








len = 


1254 


nex ~ 


3 




9 n 














Term 


5287 


4881 


- 


0 




Intr 


5613 


5357 


- 


0 




Init 


6134 


5782 




0 


25 


>2264303 


/22 










len = 


1735 


nex = 


6 






iniL 


14289 


14642 


+ 


o 


30 


Intr 


14799 


14910 


+ 


0 




T *-i 4- v 

intr 


15002 


15095 








Intr 


15228 


15405 


+ 


0 




Intr 


15488 


15557 


+ 


0 




Term 


15638 


16023 


+ 


0 


35 














>Z Z O 4 O U .3 


/7145 








len ~ 


824 


nex = 


4 






x mi. 


3387 


3465 




o 




Intr 


3544 


3666 


+ 


0 




Intr 


3754 


3870 


+ 


0 




Term 


3947 


4205 


+ 


0 


45 


>2264303 


74273 








len = 


1845 


nex = 


3 






Term 


45044 


44650 




0 


50 


Intr 


45266 


45126 




0 




Init 


46494 


46178 




0 



55 



60 



>2264303 

len = 

Init 
Intr 
Intr 
Term 



/35612 



1469 

58748 
59229 
59634 
59930 



nex = 

59002 
59277 
59833 
60216 



+ 
+ 
+ 
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949 



>2264303 



742336 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



len = 


1825 


nex = 


4 




Term 


64023 


63682 


_ 


0 


Intr 


64570 


64473 


- 


0 


Intr 


65089 


64989 


— 


0 


Tn i +- 


65506 


65289 




o 


so; 64^04 


/34402 






len — 


2558 


nex — 


5 




Init 


20281 


20902 


+ 


0 


Intr 


21285 


21510 


+ 


0 


Intr 


21627 


21849 


+ 


0 


Intr 


22104 


22317 


+ 


0 


1 tr J_ III 


22554 


22838 


-i- 


o 




/34783 






len — 


2075 


nex = 


5 




Term 


23983 


23714 


- 


0 


Intr 


24174 


24080 


- 


0 


Intr 


24709 


24267 


- 


0 


Intr 


25149 


24793 


_ 


0 


T r-v A +- 

init 


25788 


25400 




o 




/39319 






len — 


1870 


nex = 


5 




Init 


2871 


2989 


+ 


0 


intr 


3690 


3771 




o 


Intr 


3960 


4165 




0 


intr 


4328 


4381 


+ 


o 


Term 


4476 


4733 


+ 


0 


>2264304 


/9159 






len = 


1570 


nex = 


2 




Init 


41803 


42064 


+ 


0 


Term 


42974 


43372 


+ 


0 


>2264304 


738464 






len = 


1270 


nex = 


1 




Sngl 


51034 


52303 


+ 


0 


>2264304 


728578 






len = 


2110 


nex = 


5 





Init 515 1139 

Intr 1407 1504 

60 Intr 1754 1853 



+ 

+ 



0 
0 
0 
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Intr 
Term 

>2264304 
len = 
Sngl 

>2264304 

len = 

Init 
Term 

>2264304 

len - 

Sngl 

>2264304 

len = 

Sngl 

>2264304 

len = 

Sngl 

>2264304 

len = 

Sngl 

>2264304 

len = 

Sngl 

>2264304 



2027 2272 
2358 2618 

/41195 

353 nex = 

57898 57549 

/2871 

430 nex = 

6595 6647 
6733 7019 

/30073 

1810 nex = 

65320 65030 

/32071 

1128 nex = 

67814 67283 

/103464 

1096 nex « 

67814 67316 

/17818 

1136 nex = 

67814 67277 

/24095 

596 nex = 

72223 72818 

/111741 



len = 2898 nex = 

Init 77610 77692 

Intr 78044 78153 

Intr 78600 78734 

Intr 78876 79022 

Intr 79400 79483 

Intr 79589 79635 

Intr 79729 79802 

Intr 79915 79973 

Term 80152 80212 
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>2264305 
len = 



/10263 
1493 nex - 



Init 


31119 


31386 


Intr 


31604 


31784 


Intr 


31864 


32005 


Intr 


32090 


32159 


Term 


32249 


32611 



>2264305 

len = 

Term 
Intr 
Init 

>2264305 

len = 

Term 
Intr 
Intr 
Init 

>2264305 

len = 

Term 
Init 

>2264305 

len = 

Term 
Intr 
Intr 
Init 

>2264305 
len = 
Sngl 

>2264305 

len = 

Init 
Intr 
Intr 
Term 



/98400 

993 nex = 

4415 4173 
4868 4742 
5152 4965 

/36333 

1450 nex = 

4415 4119 

4868 4742 

5244 4965 

5422 5374 

/121728 

550 nex = 

5244 5080 
5422 5374 

/41072 

1312 nex = 

4415 4326 

4868 4742 

5244 4965 

5422 5374 

/24983 

599 nex - 

64677 64079 

/16865 

1615 nex = 

71009 71096 

71447 71574 

71737 71841 

72035 72347 



>2264305 



735698 
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len — 


1150 


nex = 


Init 


71025 


71096 


Intr 


71447 


71574 


Intr 


71737 


71841 


Term 


72035 


72162 


>2264306 


/21505 


len = 


1450 


nex — 


Term 


10517 


10132 


Intr 


11048 


10721 


Init 


11577 


11269 


>22 64 306 


/19024 


len = 


715 


nex = 


Term 


14439 


14066 


Init 


14777 


14527 


>2264306 


/33140 


len = 


1450 


nex = 


Term 


14439 


13966 


Intr 


14854 


14527 


Init 


15411 


14979 


>2264306 


/121213 


len = 


333 


nex = 


Sngl 


2596 


2928 


>2264306 


739888 


len = 


2203 


nex = 


Term 


35099 


34644 


Intr 


35279 


35181 


Intr 


35475 


35371 


Intr 


35651 


35559 


Intr 


35855 


35763 


Intr 


36011 


35958 


Intr 


36218 


36117 


Intr 


36369 


36295 


Init 


36846 


36503 



>2264306 



len = 



/11054 
1417 nex 



Init 


41110 


41228 


Intr 


41333 


41424 


Intr 


41763 


41818 


Intr 


42120 


42181 


Term 


42324 


42526 
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>2264306 

len = 

Init 
Intr 
Intr 
Term 

>2264306 

len - 

Term 
Intr 
Init 

>2264306 

len = 

Init 
Term 

>2264307 

len = 

Term 
Init 

>2264307 

len = 

Term 
Init 

>2264307 

len = 

Term 
Init 

>2264307 

len — 

Term 
Init 

>2264307 

len = 

Term 
Init 



/3699 

1897 nex = 

5030 5266 

5420 6238 

6325 6526 

6551 6926 

/6637 

1428 nex = 

80382 79690 
80764 80484 
81117 80852 

/111669 

382 nex = 

88535 88581 
88664 88916 

/42441 

682 nex = 

48650 48344 
49017 48966 

/22848 

658 nex = 

48650 48368 
49017 48966 

/145394 

638 nex = 

48650 48388 
49017 48966 

/11511 

77 6 nex = 

48650 48252 
49027 48966 

/12330 

670 nex = 

48650 48363 
49017 48966 



>2264307 



737668 
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954 





len = 


2959 


nex = 


12 






Term 


58676 


58435 


- 


0 


5 


Intr 


58819 


58762 


- 


0 




Intr 


59006 


58939 


- 


0 




Intr 


5 914 8 


o y u o y 








Intr 


59415 


59374 


- 


0 




Intr 


59547 


59504 




u 


10 


Intr 


59753 


59684 


_ 


0 




Intr 


60223 


60104 


- 


0 




Intr 


60499 


60481 


- 


0 




Intr 


60688 


60616 


- 


0 




Intr 


60911 


60847 


_ 


0 


15 


Init 


61393 


61056 


- 


0 




>2264307 


/24058 








len = 


1653 


nex = 


4 




20 














Init 


72492 


72816 


+ 


0 




Intr 


73287 


73411 




n 




Intr 


73485 


73593 


+ 


0 




Term 


73888 


74144 




n 
u 


25 














>2264308 


/1935 








len = 


1396 


nex = 


1 




30 


Sngl 


17599 


16204 


- 


0 




>2264308 


722483 








len = 


2981 


nex = 


8 




35 














Term 


4792 


4416 


- 


0 




Intr 


5296 


4866 




u 




Intr 


5495 


5375 


- 


0 




Intr 


5737 


5588 




u 


40 


Intr 


6028 


5823 


_ 


0 




Intr 


6224 


6110 


- 


0 




Intr 


6544 


6307 




0 




Init 


7396 


7131 




n 


45 


>2264309 


737959 








len = 


357 


nex = 


1 






Sngl 


16800 


16444 




0 


50 














>2264309 


715155 








len — 


872 


nex = 


2 




55 


Init 


22581 


22830 


+ 


0 




Term 


22927 


23337 


+ 


0 



>2264309 



736334 



60 len = 



4030 nex = 
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Term 
Intr 
Intr 

5 Intr 
Intr 
Intr 
Intr 
Init 

10 

>2264309 
len = 
15 Sngl 

>2264309 
len = 

20 

Init 
Intr 
Intr 
Intr 

25 Intr 
Intr 
Intr 
Intr 
Intr 

30 Term 
>2264310 
len ~ 

35 

Sngl 

>2264310 

40 len = 

Term 
Intr 
Intr 

4 5 Intr 
Intr 
Init 

>2264310 

50 

len = 
Sngl 

55 >2264310 
len = 
Sngl 

60 



23729 23461 

23957 23827 

24155 24049 

24319 24241 

24499 24413 

26484 26236 

26721 26572 

27488 26913 

/109246 

614 nex = 

36598 37211 

/34868 

2755 nex = 

56456 56771 

57170 57262 

57346 57427 

57612 57708 

57802 57877 

58009 58067 

58236 58358 

58523 58580 

58667 58752 

58834 59210 

/99461 

692 nex = 

11215 11906 

/15761 

2548 nex = 

19001 18686 

19291 19099 

19675 19440 

19965 19793 

20557 20507 

21233 20635 

/11083 

565 nex = 

2390 2039 

/31527 

589 nex = 

45291 45879 
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>2264310 



/17408 



956 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



len = 


642 


nex = 


1 




Sngl 


75188 


74547 


- 


0 


>2264310 


/125083 






len = 


1961 


nex = 


5 




Init 


8184 


8440 


+ 


0 


Intr 


8574 


8786 


+ 


0 


Intr 


8879 


9037 


+ 


0 


Intr 


9616 


9684 


+ 


0 


Term 


9797 


10144 


+ 


0 


>2264311 


/32868 






len = 


1724 


nex - 


5 




Term 


22845 


22268 


- 


0 


Intr 


23036 


22924 


- 


0 


Intr 


23230 


23115 


— 


0 


intr 


23684 


23307 




o 


Init 


23977 


23868 


- 


0 


>2264311 


76256 






len = 


970 


nex = 


3 




Term 


61688 


61655 


- 


0 


Intr 


61915 


61777 


- 


0 


Init 


62223 


62000 


- 


0 


>2264311 


/125951 






len — 


2213 


nex = 






Term 


60708 


60456 




o 


Intr 


60920 


60814 


- 


0 


t « 4- -f 

mtr 


61074 


61009 




o 


Intr 


61491 


61410 


- 


0 


Intr 


61688 


61644 


- 


0 


Intr 


61915 


61777 


- 


0 


Intr 


62223 


62000 


- 


0 


Init 


62668 


62430 


- 


0 


>2264311 


/27195 






len — 


1880 


nex - 


7 




Term 


82920 


82401 




0 


Intr 


83150 


83009 




0 


Intr 


83482 


83243 




0 


Intr 


83616 


83581 




0 


Intr 


83788 


83708 




0 


Intr 


83928 


83871 




0 


Init 


84280 


84011 




0 



60 >2264312 



/14950 
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10 
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25 
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55 
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len = 


881 nex = 


1 




Sngl 


27808 26928 


_ 


0 


>2z o4 j> Iz 


/ Q ^ A ^ ^ 






len — 


1 jlO IlCA 


5 




Term 


4 1 o Z o liool 




0 


Intr 


42031 41958 


- 


0 


T-f y. 

intr 


/lOOQtr 4 9119 




0 


Intr 


42519 42450 


- 


0 


lnit 


49741 49601 

ft Z ' " -1- 4 L Uu 1 




0 


>22 64 312 








len — 


412 nex = 


1 




Sngl 


4bJj-L3 4 J 3 1 J 




0 


>2264312 








len = 




1 




Sngl 


4o4Xy 3 J. ~> 




0 


>2264312 


/zuyuo 






len = 


looo nex — 


]_ 




Sngl 


4704/ 4 jyio 




0 


>2264312 


/121153 






len = 


1599 nex = 


0 




>22 64 312 


/ z ± 0 / z 






len = 


1 QQQ r^i^v — 

lyyy nex — 


5 




lnit 


7 617 o /b4 jy 


+ 


0 


Intr 


76875 77278 


+ 


0 


Intr 


77349 77609 


+ 


0 


Intr 


77680 77802 


+ 


0 


Term 


7 7 0 0 4 / 0 1 / b 




0 


>22 64 312 


/ A fi 9 R 9 
/ 4 U Z 0 Z 






len = 


929 nex — 


-J 




lnit 


8129 8281 


+ 


0 


Intr 


8374 8529 


+ 


0 


Term 


8834 9057 


+ 


0 


>2264313 


/13012 






len = 


2530 nex = 


3 




lnit 


50735 51416 


+ 


0 



Reference No. 2750-942P 
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Intr 


51723 52053 




0 


Term 


52969 53262 


+ 


0 


>z z o4 jIo 


/ ± D D ._> 1 ~) 






len = 


1 R Q 7 n p» v = 

1 J3 / I It; A 


4 




Term 


Ci:i R R Q 4 £ 
OO-L-7' D D -? *i O 




0 


Intr 


56442 56319 


- 


0 


Intr 


57210 56988 


- 


0 


Init 


57542 57464 


- 


0 


>22 64 314 


/ p c. o c: 






len = 


looc i it; a 


4 




Term 


iuud / yiuj 




o 


Intr 


10250 10148 


- 


0 


Intr 


10433 10340 


— 


0 


Init 


1 AQQO T fl ft ^ R 




o 


>22 64 31 4 


/ 1 i 

/ 11 JDI 4 ! 






len = 


l^oy nex — 


o 




Term 


26540 26126 




0 


Init 


27384 26837 


- 


0 


>2264314 


/38996 






len = 


2313 nex = 


1 




Term 


27833 27526 




0 


Intr 


OPn/l Q 9 7 Q ft 4 




0 


Intr 


28349 28278 




0 


Intr 


zoolo zo4 yz 




o 


Intr 


29046 28886 


- 


0 


intr 


9Q1 TC. ?Q1^1 




0 


Init 


29838 29580 


- 


0 


>2264314 


/32785 






len = 


1499 nex = 


1 




Sngl 


41738 42167 


+ 


0 


>2264314 


/20245 






len = 


1429 nex - 


1 




Sngl 


41738 42147 


+ 


0 


>2264314 


/5592 






len = 


1450 nex = 


0 




>2264314 


/13819 






len = 


1390 nex = 


1 





Reference No. 2750-942P 



959 





Sngl 


41738 42167 


+ 


0 




>2264314 


/29726 






5 


len = 


673 nex = 


1 






Sngl 


46055 46727 


+ 


0 




>2264314 


/41900 






1 U 












len = 


567 nex = 


1 






Sngl 


46131 46697 


+ 


0 


15 


>2264314 


/2462 








len = 


570 nex = 


1 






Sngl 


46131 46700 


+ 


0 














>2264314 


/16750 








len = 


585 nex = 


1 




25 


Sngl 


46131 46715 




0 




>2264314 


/18232 








len = 


1571 nex = 


5 




o u 








0 




Term 


48315 47879 


- 




Intr 


48456 48413 


_ 


0 




Intr 


48598 48541 




0 




Intr 






o 


35 


Init 


49449 49182 


- 


0 




>2264314 


/9012 








len = 


1870 nex = 


0 




4 U 












>2264314 


/7365 








len = 


1776 nex = 


0 




45 


>2264314 


/33059 








len = 


2811 nex = 


7 






Term 


61633 61320 


- 


0 


50 


Intr 


61973 61823 




0 




Intr 


62227 62054 




0 




Intr 


62409 62320 




0 




Intr 


62646 62576 




0 




Intr 


63811 62772 




0 


55 


Init 


64130 63836 




0 




>2264314 


/27647 







len = 

60 



1370 nex = 



3 



Reference No. 2750-942P 



Init 72212 72591 

Intr 72849 73086 

Term 73196 73581 

>2264315 /10218 

l en = 2270 nex = 

Term 26015 25438 

Intr 26141 26094 

Intr 27175 26240 

Init 27707 27384 

>2264315 /29462 

len = 1139 nex - 

Init 45117 45873 

Term 45961 46255 

>2264315 /14965 

len = 430 nex = 

Sngl 47036 46610 

>2264315 /114307 

len = 464 nex = 

Sngl 47105 46642 

>2264315 /3363 

l en = 636 nex = 

Sngl 47111 46476 

>2264315 /41666 



len = 2157 nex - 



Init 


59476 


59703 


Intr 


59800 


59887 


Intr 


60015 


60074 


Intr 


60160 


60192 


Intr 


60278 


60355 


Intr 


60433 


60476 


Intr 


60582 


60622 


Intr 


60709 


60791 


Intr 


60876 


60967 


Intr 


61055 


61124 


Intr 


61205 


61246 


Term 


61348 


61632 



>2264316 /31759 

len = 1810 nex = 

Term 40887 40024 

Intr 41245 40976 



Reference No. 2750-942P 



Init 

>2264316 

len = 

Term 
Intr 
Intr 
Init 

>2264316 

len = 

Init 
Term 

>2264316 

len = 

Term 
Intr 
Intr 
Init 

>2264316 

len = 

Term 
Intr 
Intr 
Init 

>2264316 

len = 

Term 
Init 

>2264316 

len = 

Init 
Intr 
Term 

>2264316 

len = 



41826 41375 

/4716 

1150 nex = 

48078 47771 

48347 48169 

48549 48448 

48918 48760 

/35357 

3430 nex = 

4937 5508 
7116 8360 

/13418 

1121 nex - 

50134 49841 

50452 50271 

50665 50567 

50961 50832 

/2S839 

1733 nex - 

52037 51717 

52799 52621 

52994 52893 

53449 53248 

/5103 

566 nex = 

56108 55749 
56314 56188 

/25723 

1118 nex = 

70502 70609 
70687 70765 
71265 71619 

/28686 

1761 nex = 



Init 


73159 


73478 


Intr 


73823 


73864 


Intr 


74151 


74238 


Intr 


74355 


74436 


Term 


74532 


74919 



Reference No. 2750-942P 



>2264316 



len = 



>2264316 

len = 

Init 
Intr 
Intr 
Term 

>2264317 

len = 

Init 
Intr 
Intr 
Term 

>2264317 

len = 



>2264317 
len - 



>2264317 

len = 

Init 
Intr 
Intr 
Intr 



/33187 
1316 nex 



Init 


75294 


75411 


Intr 


75493 


75533 


Intr 


75623 


75723 


Intr 


75977 


76121 


Intr 


76215 


76304 


Term 


76389 


76609 



/40559 



940 



nex 



75623 75723 

75977 76121 

76215 76304 

76389 76430 



/27304 



1450 

10536 
11094 
11430 
11678 



nex = 

10865 
11307 
11575 
11977 



/41386 



2230 



Init 


18624 


18806 


Intr 


19320 


19433 


Intr 


19544 


19688 


Intr 


19786 


19863 


Intr 


19964 


20076 


Intr 


20166 


20269 


Term 


20364 


20848 



/19638 



1116 



Term 


39626 


39380 


Intr 


39837 


39741 


Intr 


39994 


39932 


Intr 


40263 


40110 


Init 


40495 


40353 



76734 



2230 

43041 
43615 
43820 
44029 



nex = 

43121 
43732 
43927 
44153 



Reference No. 2750-942P 



963 





-r — +. „ 

mtr 


44256 


44520 




o 




Intr 


44612 


44680 


+ 


0 




Intr 


44773 


44934 


+ 


0 




Term 


45031 


45269 


+ 


0 


5 














>2264318 


73797 








len = 


3010 


nex = 


11 




10 


Term 


14549 


14209 


- 


0 




Intr 


14698 


14642 


- 


0 




Intr 


14911 


14777 


- 


0 






15084 


15004 




o 




Intr 


15230 


15162 


- 


0 


1 R 
± D 


inLr 


15408 


15334 




o 




Intr 


15837 


15757 


- 


0 




Intr 


16050 


15932 


_ 


0 




Intr 


16304 


16139 


- 


0 




Intr 


16522 


16393 


- 


0 


20 


Init 


17210 


16609 


- 


0 




>2264318 


/33231 








len — 


1510 


nex = 


4 




25 














Term 


19006 


18683 




o 




Intr 


19387 


19102 


- 


0 




i n Li 


19635 


19485 




o 




Init 


20191 


19835 


- 


0 


^ n 
J u 














>2264318 


742276 








len = 


681 


nex = 


1 




35 


Sngl 


24794 


24114 


- 


0 




>2264318 


726752 








len = 


754 


nex = 


1 




40 














Sngl 


6372 


6627 


+ 


0 




>Z2. o4 jlo 


725855 








len = 


2410 


nex = 


4 






Init 


74093 


74435 


+ 


0 




Intr 


74770 


74907 


+ 


0 




Intr 


75288 


75359 


+ 


0 


50 


Term 


75730 


76502 


+ 


0 




>2264319 


737985 








len = 


1041 


nex = 


3 




55 














Term 


29497 


28961 




0 




Intr 


29867 


29820 




0 




Init 


30001 


29945 




0 



60 >2264320 



736697 



Reference No. 2750-942P 



len = 



3070 



Init 


77774 


77917 


Intr 


78577 


78735 


Intr 


78827 


78886 


Intr 


79001 


79047 


Intr 


79159 


79212 


Intr 


79302 


79479 


Intr 


79602 


79754 


Intr 


79848 


79913 


Intr 


80000 


80047 


Intr 


80127 


80233 


Intr 


80327 


80405 


Terra 


80513 


80843 



>2264321 
len = 
Sngl 

>2264321 
len = 



>2264321 
len = 



>2264321 



len 



Init 
Intr 
Term 



/22350 
1378 nex = 
42485 43862 
/17195 



2639 



nex 



Term 


44138 


43885 


Intr 


44545 


44459 


Intr 


44723 


44638 


Intr 


45000 


44910 


Intr 


45164 


45079 


Intr 


45369 


45261 


Intr 


45565 


45519 


Intr 


45725 


45656 


Init 


46523 


45984 



/4025 
1845 nex = 



Term 


62897 


62608 


Intr 


63027 


62981 


Intr 


63329 


63108 


Intr 


63461 


63409 


Intr 


63720 


63555 


Intr 


63853 


63812 


Intr 


64024 


63940 


Init 


64452 


64119 



/226 

1214 nex = 

64562 64824 
65227 65327 
65506 65775 



>2264321 



725843 



len = 



1179 nex = 



Reference No. 2750-942P 



Sngl 

>2264367 

len = 

Sngl 

>2264367 

len = 

Init 
Intr 
Intr 
Term 

>2264367 
len = 
Sngl 

>2264367 

len = 

Term 
Intr 
Intr 
Init 

>2275194 

len - 

>2275194 

len = 

Sngl 

>2275194 

len = 

Sngl 

>2275194 

len = 

>2275194 

len = 

Sngl 

>2275194 



64602 65780 

/13226 

760 nex = 

17702 16945 

/6280 

1721 nex - 

79635 80401 

80649 80739 

80875 81047 

81136 81355 

/14253 

394 nex = 

79694 80087 

/2093 

1697 nex = 

81924 81450 

82092 82014 

82411 82172 

83146 82545 

/35109 

1541 nex = 

/20378 

540 nex = 

46427 45888 

/6324 

564 nex = 

81129 81692 

/95662 

550 nex = 

/21715 

339 nex = 

81340 81678 

/34414 



Reference No. 2750-942P 



966 





len = 


2177 


nex = 


6 






Init 


1529 


1687 


+ 


0 


5 


Intr 


1807 


1877 




o 




Intr 


2195 


2314 


+ 


0 




Intr 


2406 


2524 


+ 


0 




Intr 


2616 


2697 


+ 


0 




Term 


2789 


3076 


+ 


0 


10 














>2275194 


/35584 








len = 


2064 


nex = 


6 




15 


Init 


1529 


1687 


+ 


0 




Intr 


1807 


1877 




0 




Intr 


2195 


2314 


+ 


0 




Intr 


2406 


2524 


+ 


0 




Intr 


2616 


2697 


+ 


0 


20 


Term 


2789 


3017 


+ 


0 



>2281081 



799937 





len = 


1179 


nex = 


4 


25 












Init 


17994 


18277 


+ 




Intr 


18570 


18617 


+ 




Intr 


18757 


18836 


+ 




Term 


18973 


19172 


+ 


30 












>2281081 


734407 






len = 


1614 


nex — 


2 


35 


Term 


20892 


20042 






Init 


21655 


20980 






>2281081 


724415 




40 


len = 


1398 


nex = 


1 




Sngl 


37217 


3/695 






>2281081 


721866 




45 












len = 


3043 


nex = 


10 




Init 


41405 


41802 


+ 




Intr 


41989 


42098 


+ 


50 


Intr 


42186 


42243 


+ 




Intr 


42347 


42610 


+ 




Intr 


42881 


43018 


+ 




Intr 


43151 


43202 


+ 




Intr 


43288 


43367 


+ 


55 


Intr 


43475 


43534 


+ 




Intr 


43663 


43743 


+ 




Term 


44186 


44447 


+ 



0 
0 
0 
0 
0 
0 
0 
0 
0 
0 



>2281081 

60 



7117763 



Reference No. 2750-942P 



967 





len = 


168 


nex = 


1 






Sngl 


44280 


44447 


+ 


0 




1 Oft 1 


737969 








len = 


1630 


nex — 


2 






Init 


45636 


46252 


+ 


0 


10 


Term 


46437 


47256 




0 




>2281081 


/97249 








len = 


1570 


nex = 


3 


















Init 


75474 


75567 


+ 


0 




Intr 


75664 


75773 


+ 


0 




Term 


76110 


76381 


+ 


0 


20 


>2288979 


/30737 








len = 


1957 


nex = 


6 






Term 


23749 


23549 




0 


25 


Intr 


24382 


24215 


- 


0 






24583 


24465 




0 




Intr 


24734 


24673 




0 




Intr 


24906 


24830 


- 


0 




Init 


25505 


25278 


- 


0 


30 














>2288979 


/42038 








len ~ 


2417 


nex = 






o c 


± tr J. Ill 


26123 


25700 




0 




Intr 


26352 


26213 


- 


0 




Intr 


26728 


26523 


- 


0 




Intr 


27113 


27007 


- 


0 




Intr 


27509 


27330 


— 


0 




T n -i 4- 
± I l-L l_ 


28116 


27832 




0 




>2288979 


/5460 








len = 


1369 


nex = 


2 




45 














1 C X III 


61213 


60939 




0 




Init 


61831 


61648 


- 


0 




>2288979 


/31535 






50 














len = 


971 


nex = 


1 






Sngl 


6953 


5983 




0 


55 


>2288979 


/15927 








len = 


582 


nex = 


2 





Init 83467 83577 
60 Term 83732 84048 



+ 

+ 



0 
0 



Reference No. 2750-942P 



968 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



>2288979 

len = 

Init 
Term 

>2288979 

len = 

Init 
Intr 
Term 

>2288979 

len = 

Init 
Intr 
Term 

>2288979 

len = 

Init 
Term 

>2288979 

len = 

Init 
Term 

>2288979 
len = 
Sngl 

>2288979 

len = 

Init 
Term 

>2288979 

len = 

Init 
Term 

>2288979 



/14769 

598 nex = 

84968 85307 
85318 85559 

/22360 

621 nex = 

84968 85076 
85239 85307 
85318 85582 

/8155 

637 nex = 

84968 85076 
85239 85307 
85318 85598 

/91704 

685 nex = 

85879 86099 
86237 86563 

/14241 

594 nex — 

85971 86099 
86237 86564 

/27364 

593 nex = 

87277 87869 

/16079 

558 nex = 

88140 88265 
88343 88697 

/85 

670 nex = 

88140 88265 
88343 88809 

/35036 



+ 
+ 
+ 



+ 
+ 



+ 
+ 



+ 
+ 



+ 
+ 



Reference No. 2750-942P 



969 





len — 


510 


nex = 


2 






Term 


89987 


89866 


- 


0 




Init 


90375 


90255 


- 


0 


c 

D 














>2326340 


/17730 








len = 


938 


nex = 


3 




10 


Init 


12848 


12929 


+ 


0 




Intr 


13222 


13294 


+ 


0 




Term 


13456 


13785 




0 




>z jjoutsy 


/17415 






15 














len — 


1711 


nex = 








Init 


18997 


19833 


+ 


0 




Term 


20359 


20707 


+ 


0 


20 














>2335089 


/41462 








len = 


2561 


nex = 


7 




25 


Init 


77553 


77859 


+ 


0 




Intr 


78200 


78282 




o 




Intr 


78527 


78615 


+ 


0 




Intr 


78796 


78869 




o 




Intr 


78950 


79000 


+ 


0 


J U 


Intr 


79347 


79408 




o 




Term 


79492 


80113 


+ 


0 




>2337888 


/30632 






35 


len - 


597 


nex = 


1 






Sngl 


45399 


44803 


- 


0 




>2337888 


/33132 






4 (J 














len = 


1427 


nex = 


1 






Sngl 


56372 


54946 


- 


0 




>2337 b 8 o 


/25271 








len = 


1190 


nex — 


4 






Term 


81979 


81460 




0 


50 


Intr 


82251 


82069 




0 




Intr 


82443 


82348 




0 




Init 


82649 


82529 




0 




>2337888 


736364 






55 














len = 


2473 


nex = 


10 





Term 81979 81474 
Intr 82251 82069 
60 Intr 82443 82348 



0 
0 
0 



Reference No. 2750-942P 



970 





Intr 


82588 


82529 


- 


0 




Intr 


82726 


82673 


— 


0 




Intr 


82906 


82829 




0 




Intr 


83042 


82989 




0 


5 


Intr 


83230 


83147 


- 


0 




Intr 


83655 


83627 




0 




Init 


83946 


83768 


- 


0 




>2337888 


/48 








10 














len = 


139 


nex = 


1 






Sngl 


84105 


83967 




0 


15 


>2337888 


/39291 








len = 


1330 


nex - 


2 






Init 


9724 


10277 


+ 


0 


20 


Term 


10380 


11048 


+ 


0 




>2341023 


/20848 








len = 


2394 


nex = 


8 




25 














Term 


105381 


105134 




0 




Intr 


105770 


105531 


- 


0 




Intr 


106011 


105948 




0 




Intr 


106356 


106242 


- 


0 




Intr 


106669 


106531 




0 




Intr 


106971 


106841 


- 


0 




Intr 


107209 


107080 


— 


0 




Init 


107527 


107476 




0 


35 


>2341023 


/4513 








len = 


1287 


nex = 


3 






Term 


16071 


15822 


- 


0 


40 


Intr 


16960 


16676 


- 


0 




Init 


17108 


17082 


- 


0 




>2341023 


726558 






45 


len = 


1150 


nex = 


3 






Term 


23857 


23331 


- 


0 




Intr 


24045 


23945 


'_ 


0 






24472 


24392 




0 


50 














ZlL. J4 A.\J *J 


/23398 








len = 


2892 


nex — 


2 




55 


Term 


36567 


36137 




0 




Init 


39028 


38927 




0 




>2341023 


/40467 






60 


len = 


2202 


nex = 


1 





Reference No. 2750-942P 



Init 


41815 


41979 


Intr 


42299 


42457 


Intr 


42564 


42739 


Intr 


42897 


43174 


Intr 


43264 


43399 


Intr 


43492 


43603 


Term 


43692 


44016 



>2341023 

len = 

Init 
Intr 
Intr 
Term 

>2341023 

len = 

Init 
Term 

>2341023 

len = 

Init 
Intr 
Term 

>2341023 
len = 
Sngl 

>2341023 

len = 

Init 
Intr 
Term 

>2341023 

len = 

Sngl 

>2341023 

len = 

Sngl 



/19832 

2656 nex = 

45198 45615 

45720 45944 

46040 46752 

46898 47342 

/91880 

1118 nex = 

46306 46752 
46898 47423 

/8374 

805 nex = 

84788 85031 
85113 85256 
85340 85592 

/9471 

649 nex = 

85423 85236 

/30909 

1020 nex - 

90351 90483 
90571 90628 
91104 91353 

/28606 

730 nex = 

91839 92568 

/125151 

310 nex = 

96904 96600 



>2341023 



/33613 



Reference No. 2750-942P 



972 

len = 2290 nex = 5 

Term 94901 94658 - 0 

5 Intr 95464 95403 - 0 

Intr 95744 95606 - 0 

Intr 96270 96059 - 0 

Init 96946 96584 - 0 

10 >2342673 /21644 

len = 568 nex = 2 

Init 1 19 +0 

15 Term 287 568 + 0 

>2342673 /4236 

len = 1031 nex = 1 

20 

Sngl 15499 14469 » 0 

>2342673 /13218 

25 len = 600 nex = 1 

Sngl 59777 59178 - 0 

>2342673 /1911 

30 

len = 1410 nex = 0 

>2342673 /15745 

35 len = 2693 nex = 15 

Term 72951 72598 - 0 

Intr 73173 73059 - 0 

Intr 73327 73268 - 0 

40 Intr 73473 73420 - 0 

Intr 73651 73592 - 0 

Intr 73809 73747 - 0 

Intr 73936 73893 - 0 

Intr 74109 74025 - 0 

45 Intr 74283 74203 - 0 

Intr 74471 74379 - 0 

Intr 74618 74554 - 0 

Intr 74789 74714 - 0 

Intr 74956 74891 - 0 

50 Intr 75176 75051 - 0 

Init 75290 75255 - 0 

>2342673 /20814 

55 len = 2669 nex = 14 

Term 87698 87414 - 0 

Intr 87906 87792 - 0 

Intr 88057 87998 - 0 

60 Intr 88219 88166 - 0 



Reference No. 2750-942P 



973 





Intr 


88375 


88316 


— 


0 




Intr 


88529 


88467 




0 




Intr 


88664 


88621 




0 




Intr 


88853 


88769 


- 


0 


5 


Intr 


89044 


88964 




0 




Intr 


89241 


89149 


- 


0 




Intr 


89408 


89344 


— 


0 




Intr 


89583 


89508 


— 


0 




Intr 


89751 


89686 




0 


10 


Init 


89916 


89851 


— 


0 




>2342673 


/36585 








len = 


3206 


nex = 


16 




15 














Term 


87698 


87522 


— 


0 




Intr 


87906 


87792 


— 


0 




Intr 


88057 


87998 


— 


0 




Intr 


88219 


88166 


— 


0 


20 


Intr 


88375 


88316 


— 


0 




Intr 


88529 


88467 


— 


0 




Intr 


88664 


88621 


— 


0 




Intr 


88853 


88769 




0 




Intr 


89044 


88964 




0 


25 


Intr 


89241 


89149 


- 


0 




Intr 


89408 


89344 




0 




Intr 


89583 


89508 


- 


0 




Intr 


89751 


89686 




0 




Intr 


89916 


89851 


- 


0 


30 


Intr 


90281 


90192 




0 




Init 


90727 


90584 




0 




>2342673 


739667 






35 


]_gj^ = 


827 


nex = 


2 






Init 


95406 


95717 


+ 


0 




Term 


95822 


96232 


+ 


0 


40 


>2342717 


/13928 








len - 


4710 


nex = 


16 






Term 


28916 


28495 


— 


0 


45 


Intr 


29102 


29002 


— 


0 




Intr 


29276 


29211 


— 


0 




Intr 


29479 


29365 


— 


0 




Intr 


29760 


29654 


— 


0 




Intr 


29937 


29848 


— 


0 


50 


Intr 


30204 


30094 


— 


0 




Intr 


30570 


30505 


— 


0 




Intr 


30730 


30665 




0 




Intr 


31414 


31265 




0 




Intr 


31587 


31513 




0 


55 


Intr 


32170 


32079 




0 




Intr 


32332 


32267 




0 




Intr 


32516 


32417 




0 




Intr 


32772 


32611 




0 




Init 


33012 


32912 




0 



60 



Reference No. 2750-942P 



>2342717 



723892 



974 





len = 


1550 


nex - 


4 




5 


Term 


33902 


33442 


- 


0 




Intr 


34398 


34340 


- 


0 




Intr 


34564 


34485 


- 


0 




Init 


34991 


34651 


- 


0 


10 


>2342717 


/25519 








len = 


2805 


nex = 


5 






Term 


38674 


38181 


- 


0 


15 


Intr 


38927 


38769 


- 


0 




Intr 


39218 


39037 


- 


0 




Intr 


40474 


40303 


- 


0 




Init 


40985 


40560 




o 


20 


>2351061 


/36048 








len = 


2257 


nex = 


A 






Term 


36654 


36150 


- 


0 


25 


Intr 


37353 


37320 




0 




Intr 


37883 


37644 


- 


0 




Init 


38406 


38255 




n 




>2351061 


/16286 






30 














len = 


1302 


nex = 


2 






Init 


60023 


60178 


+ 


0 




Term 


60434 


60780 


+ 


0 


35 














>2351061 


/25119 








len = 


2152 


nex = 


5 




40 


Init 


72312 


72460 


+ 


0 




Intr 


72978 


73443 




o 




Intr 


73577 


73670 


+ 


0 




Intr 


73763 


73893 




n 




Term 


74106 


74463 


+ 


0 


45 














>2351061 


/7022 








len = 


1348 


nex = 


1 




50 


Sngl 


74769 


74513 


- 


0 




>2351061 


/37512 








len = 


1737 


nex = 


0 




55 














>2351062 


/1575 








len = 


1492 


nex = 


2 




60 


Init 


11143 


11366 


+ 


0 



Reference No. 2750-942P 



975 





Term 


11952 


12270 


+ 


0 




>2351062 


/38092 






5 


len = 


2470 


nex = 


3 






Term 


27085 


26904 


- 


0 




Intr 


28828 


27521 




0 




Init 


29365 


29247 




0 


A. U 














>2351062 


/17241 








len = 


1404 


nex = 


3 




15 


Init 


29965 


30040 


+ 


0 




Intr 


30233 


30463 




0 




Term 


30712 


30955 


+ 


0 




>2351062 


/31041 






20 














len = 


2710 


nex = 


8 








50901 


51179 




o 




Intr 


51563 


51664 


+ 


0 




111 LI 


51779 


51832 


+ 


o 




Intr 


52010 


52102 


+ 


0 




Intr 


52264 


52356 


+ 


0 




Intr 


52687 


52791 


+ 


0 




Intr 


52881 


52979 


+ 


0 


30 


Term 


53072 


53603 


+ 


0 




>2351062 


723924 








len = 


1277 


nex — 


3 


















Init 


71481 


71998 


+ 


0 




Intr 


72070 


72397 


+ 


0 




Term 


72483 


72757 


+ 


0 


40 


>2351063 


/114691 








len = 


1789 


nex = 


8 






Term 


20785 


20575 




o 


45 


Intr 


20954 


20889 


- 


0 




T n +- r* 
1 11 L 1 


21132 


21047 




o 




Intr 


21269 


21235 


- 


0 




T y-j 4— -y 


21455 


21369 




o 




Intr 


21616 


21539 


- 


0 


50 


Intr 


21741 


21701 


- 


0 




Init 


22363 


22239 


_ 


0 




>2351063 


736626 






55 


len = 


1476 


nex = 


6 






Term 


21132 


21053 




0 




Intr 


21269 


21235 




0 




Intr 


21455 


21369 




0 


60 


Intr 


21616 


21539 




0 



Reference No. 2750-942P 



Intr 
Init 

>2351063 

len = 

Init 
Intr 
Intr 
Term 

>2351063 

len — 

Init 
Intr 
Intr 
Term 

>2351063 

len = 



21741 
22528 



21701 
22239 



/31913 



1211 nex = 



28196 
28394 
28552 
28658 



28319 
28464 
28573 
29015 



/103246 

1195 nex = 

28196 28319 

28394 28464 

28552 28573 

28658 29015 

/36058 

2835 nex = 



Init 


55242 


55559 


Intr 


55634 


55699 


Intr 


55825 


55890 


Intr 


56186 


56264 


Intr 


56488 


56608 


Intr 


56694 


56789 


Intr 


56864 


56976 


Intr 


57238 


57354 


Intr 


57635 


57735 


Term 


57871 


58076 



>2351063 
len = 
Sngl 
>2351063 
len = 
Sngl 
>2351063 
len = 



/95281 
430 nex = 
58996 59416 
/108981 



314 



62819 



nex 



63132 



/19716 



2088 



Term 


66456 


66175 


Intr 


66816 


66527 


Intr 


67192 


66895 


Intr 


67350 


67280 


Intr 


67560 


67444 


Intr 


67709 


67635 


Intr 


67857 


67796 


Init 


68262 


68028 



Reference No. 2750-942P 



>2351063 



/18140 



len = 


3071 


nex = 


Term 


81242 


80797 


Intr 


81474 


81378 


Intr 


81610 


81555 


Intr 


81979 


81686 


Init 


82808 


82071 


>2351064 


/10154 


len = 


1092 


nex = 


Init 


30526 


30610 


Intr 


30871 


30941 


Intr 


31032 


31188 


Intr 


31364 


31450 


Term 


31536 


31617 



>2351064 
len = 



>2351064 

len = 

Init 
Term 

>2351064 

len = 



/23922 
2156 nex = 



Init 


30531 


30610 


Intr 


30871 


30941 


Intr 


31032 


31188 


Intr 


31364 


31450 


Intr 


31536 


31687 


Intr 


31802 


31882 


Intr 


31983 


32091 


Intr 


32233 


32359 


Term 


32454 


32686 



/41054 

500 nex = 

32229 32359 
32454 32728 

/37122 

2271 nex = 



Term 


52016 


51678 


Intr 


52304 


52104 


Intr 


52616 


52417 


Intr 


52811 


52698 


Init 


53187 


53050 



>2351065 
len = 
Sngl 



/8508 
286 nex = 
1156 871 



>2351065 



729363 



Reference No. 2750-942P 



len = 8125 nex = 

Term 5274 4953 

Intr 12650 5804 

Init 13070 12743 

>2351065 /3542 

len = 1606 nex = 

Term 12650 12382 

Intr 13557 12743 

Init 13987 13679 

>2351065 /117588 

len = 1433 nex = 



Init 
Intr 
Term 

>2351065 

len = 

Term 
Intr 
Intr 
Intr 
Intr 
Intr 
Intr 
Intr 
Init 

>2351065 

len = 

Term 
Intr 
Intr 
Intr 
Intr 
Intr 
Intr 
Intr 
Init 

>2351065 

len = 

Sngl 

>2351065 



26825 26985 

27076 27149 

27414 28257 

/15229 

1952 nex = 

28953 28676 

29086 29035 

29404 29169 

29662 29605 

29821 29753 

30022 29914 

30232 30165 

30434 30315 

30627 30561 



nex = 

28675 
29035 
29169 
29605 
29753 
29914 
30165 
30315 
30561 

/105944 

254 nex = 

38996 38743 

/6823 



28953 
29086 
29404 
29662 
29821 
30022 
30232 
30434 
30627 

/410 

1956 

28953 
29086 
29404 
29662 
29821 
30022 
30232 
30434 
30630 



len = 



4 36 nex = 



Reference No. 2750-942P 



979 





Sngl 


420 855 


+ 


0 




>2351065 


/15640 






5 


len = 


2139 nex = 


7 






Term 


54303 53997 


- 


0 




Intr 


54528 54415 




0 




Intr 


54773 54648 


- 


0 


10 


Intr 


55027 54948 




0 




Intr 


55198 55117 


- 


0 




Intr 


55390 55316 




0 




Init 


56135 55791 




0 


15 


>2351065 


/633 








len = 


529 nex = 


1 






Sngl 


56522 57050 


+ 


0 


20 












>2351065 


/104017 








len — 


1017 nex = 


2 




25 


Term 


62259 61832 




0 




Init 


62503 62277 


- 


0 




>2351066 


/92216 






30 


len = 


1063 nex = 


1 






Sngl 


2252 1951 


- 


0 




>2351066 


/18332 






35 












len = 


372 nex = 


1 






Sngl 


51067 50696 




0 


40 


>2351066 


/19255 








len — 


1001 nex - 


2 





45 



50 



55 



60 



Init 
Term 

>2351066 
len = 
Sngl 

>2351066 

len = 

Init 
Intr 
Term 



6275 6505 
6677 6809 

/93148 

557 nex - 

64963 64407 

/9184 

1493 nex = 

65437 65484 
65563 65622 
66328 66800 



Reference No. 2750-942P 



>2351066 



/94924 



980 





len = 


772 


nex = 


4 




5 


Term 


66989 


66757 


— 


0 




Intr 


67176 


67069 


— 


0 




Intr 


67314 


67274 


— 


0 




Init 


67528 


67424 




0 


10 


>2351066 


/117503 








len = 


1270 


nex = 


5 






Term 


82968 


82813 


- 


0 


15 


Intr 


83338 


83123 


— 


0 




Intr 


83553 


83453 




0 




Intr 


83928 


83699 




0 




Init 


84064 


83998 


- 


0 


20 


>2351067 


/24137 








len = 


592 


nex = 


1 






Sngl 


23773 


23182 


- 


0 


25 














>2351067 


/102435 








len ~ 


1553 


nex = 


1 




30 


Sngl 


31407 


31589 


+ 


0 






/42506 








len = 


913 


nex = 


2 




35 














Init 


3624 


3761 


+ 


0 




Term 


4109 


4536 




o 




jjIUD / 


/37503 






40 














len = 


1398 


nex = 


4 






Term 


39519 


39286 


- 


0 




Intr 


39736 


39638 


- 


0 


45 


Intr 


40371 


40283 


- 


0 




Init 


40683 


40599 




0 




>2351067 


/23800 






50 


len = 


1450 


nex — 


4 






Term 


39519 


39294 




0 




Intr 


39736 


39638 




0 




Intr 


40371 


40283 




0 


55 


Init 


40735 


40599 




0 



>2351067 



/12458 



len = 

60 



192 nex = 



1 



10 



15 



20 



25 



Reference 


NO. 


2750-942P 














981 


Sngl 


43705 


43896 


+ 


0 


>2351068 


/108814 








755 


nex = 


4 




Init 


14299 


14392 


+ 


0 


Intr 


14508 


14644 


+ 


0 


Intr 


14771 


14817 


+ 


0 


Term 


14906 


15053 


+ 


0 


>2351068 


/33315 






— 

x e n — 


2311 


nex — 


9 




1111 L 


14309 


14392 


+ 


0 




14508 


14644 


+ 


0 




14771 


14817 


+ 


0 


T n 4- r 


14906 


15231 




0 


T n f- r 


15511 


15593 




0 


Intr 


15693 


15768 


+ 


0 


Intr 


15855 


16012 


+ 


0 


Intr 


16102 


16263 


+ 


0 


Term 


16357 


16619 


+ 


0 


>2351068 


/37265 






len = 


2272 


nex = 


9 





30 


Init 


14347 


14392 


+ 


0 




Intr 


14508 


14644 


+ 


0 




Intr 


14771 


14817 


+ 


0 




Intr 


14906 


15231 


+ 


0 




Intr 


15511 


15593 


+ 


0 


35 


Intr 


15693 


15768 


+ 


0 




Intr 


15855 


16012 


+ 


0 




Intr 


16102 


16263 


+ 


0 




Term 


16357 


16618 


+ 


0 


40 


>2351068 


/777 








len = 


540 


nex = 


1 






Sngl 


22901 


23440 


+ 


0 


45 














>2351068 


/2304 








len = 


550 


nex = 


1 




50 


Sngl 


22901 


23442 


+ 


0 




>2351068 


/15211 








len = 


560 


nex = 


1 




55 










0 




Sngl 


22904 


23463 


-f 



>2351068 



727372 



60 len - 



1870 nex = 



3 



Reference No. 2750-942P 



982 





Init 


42505 


42957 


+ 


0 




Intr 


43205 


43414 


+ 


0 




Term 


43963 


44371 


+ 


0 


5 














>2351068 


/5335 








1 en = 


731 


nex - 


3 




i n 


1111 L 


5218 


5304 


+ 


0 




Intr 


5320 


5477 


+ 


0 




Icllll 


5551 


5931 


+ 


o 




v noci ACQ 
>ZJJ 1 U DO 


722794 






15 














len = 


857 


nex = 


1 






Sngl 


61140 


61996 


+ 


0 


20 


>2351068 


728601 








len = 


2200 


nex = 


7 






init 


65723 


65950 




0 


25 


Intr 


66035 


66198 


+ 


0 




Intr 


66298 


66349 


+ 


0 




Intr 


66544 


66771 


+ 


0 






66874 


67063 




o 




Intr 


67153 


67418 


+ 


0 


J5 U 


Term 


67680 


67922 




o 




>23510o8 


725211 








len = 


1220 


nex = 


1 




35 














Sngl 


69965 


68746 


- 


0 




>2351069 


726016 






a n 


len = 


2002 


nex = 


7 






mi u 


26231 


26670 


+ 


o 




Intr 


26762 


26870 


+ 


0 




Intr 


26960 


27122 


+ 


0 


45 


Intr 


27209 


27357 


+ 


0 




Intr 


27450 


27601 


+ 


0 




Intr 


27686 


27800 


+ 


0 




Term 


27886 


28232 


+ 


0 


50 


>2351069 


71271 








len = 


3480 


nex = 


9 





55 



60 



Init 


42775 


42864 


+ 


Intr 


43235 


43369 


+ 


Intr 


43517 


43633 


+ 


Intr 


43791 


43942 


+ 


Intr 


44014 


44098 


+ 


Intr 


44277 


44371 


+ 


Intr 


44852 


45017 


+ 



0 
0 
0 
0 
0 
0 
0 



Reference No. 2750-942P 



Intr 
Term 

>2351069 

len = 

Init 
Intr 
Term 

>2351069 

len = 



>2351069 
len = 



>2351070 
len = 
Sngl 
>2351070 
len = 
Sngl 
>2351070 
len = 



45150 
45434 



45345 
45819 



/13271 



702 

62856 
62964 
63127 



nex = 

62885 
63042 
63557 



/7744 



1618 nex 



Term 


67305 


66948 


Intr 


67508 


67411 


Intr 


67723 


67598 


Intr 


67896 


67813 


Intr 


68098 


67982 


Intr 


68261 


68178 


Intr 


68427 


68380 


Init 


68562 


68508 



73285 



3163 



nex 



Term 


67262 


67061 


Intr 


67508 


67411 


Intr 


67723 


67598 


Intr 


67896 


67813 


Intr 


68098 


67982 


Intr 


68261 


68178 


Intr 


68427 


68380 


Intr 


68562 


68508 


Intr 


68759 


68704 


Intr 


68928 


68844 


Intr 


69102 


69029 


Intr 


69415 


69349 


Init 


70098 


70008 



/97197 
697 nex = 
23957 23261 
/6363 
560 nex = 
34956 34397 
/26053 
817 nex = 



Sngl 46123 46936 



Reference No. 2750-942P 



984 



>2351071 



/17432 





len = 


2313 


nex = 


9 




5 


Term 


46885 


46586 


- 


0 




T »-\ ■•- v 

intr 


47174 


47088 




o 




Intr 


47356 


47291 


- 


0 




Intr 


47556 


47467 




o 




Intr 


47720 


47640 


- 


0 


10 


Intr 


47910 


47833 


- 


0 




Intr 


48093 


48003 


- 


0 




Intr 


48436 


48295 




o 




Init 


48898 


48628 


- 


0 


15 


>2351071 


/39195 








len = 


2186 


nex = 


3 






Term 


70730 


70227 


- 


0 


20 


Intr 


71606 


71158 




0 




Init 


72412 


72145 




0 




>2351071 


/17360 






25 


len = 


1402 


nex = 


3 






Term 


78193 


77927 




0 




Intr 


78535 


78274 




0 




Init 


79311 


79168 




0 



30 



35 



40 



45 



50 



55 



60 



>2351071 

len = 

Term 
Intr 
Init 

>2351072 

len = 

Term 
Intr 
Intr 
Intr 
Init 

>2351072 
len = 
Sngl 

>2351073 

len = 

Term 
Intr 



/26743 

14 66 nex = 

78193 77927 
78535 78274 
79392 79168 

/29659 

2508 nex = 

22869 22279 

23128 23019 

23667 23238 

23978 23838 

24786 24671 

/207148 

797 nex = 

50991 50195 

/98326 

676 nex - 



0 
0 
0 
0 
0 



19588 
19757 



19334 
19681 



Reference No. 2750-942P 



Init 
>2351073 



19996 19838 
/100141 



len = 


1717 


nex = 


Term 


19588 


19293 


Intr 


19757 


19681 


Intr 


20220 


19838 


Intr 


20633 


20533 


Init 


21009 


20902 


>2351073 


/115914 


len = 


116 


nex = 


Sngl 


26710 


26595 


>2351073 


/95599 


len = 


749 


nex - 


Term 


26967 


26608 


Intr 


27178 


27047 


Init 


27356 


27258 


>2351073 


735552 


len = 


1828 


nex = 


Term 


26967 


26653 


Intr 


27178 


27047 


Intr 


27399 


27258 


Intr 


27742 


27550 


Intr 


28087 


27842 


Init 


28480 


28170 



>2351073 
len - 
Sngl 

>2358139 

len = 

Init 
Intr 
Term 

>2358139 

len = 

Term 
Init 



/118777 

1030 nex = 

31871 32900 

/20380 

876 nex = 

15794 15936 
16035 16176 
16428 16669 

/29808 

1270 nex = 

64249 63873 
65100 64760 



>2358139 



/108558 



Reference No. 2750-942P 



986 





len = 


1069 


nex = 


3 






Init 


65271 


65413 


+ 


0 




Intr 


65781 


65860 


+ 


0 


5 


Term 


66116 


66339 


+ 


0 




>2358139 


/1730 








len = 


1484 


nex = 


3 




10 














Init 


71725 


71848 


+ 


0 




Intr 


72291 


72590 


+ 


0 




Term 


72701 


73208 




o 


1 D 


>z jyz / oZ 


/8805 








len = 


1259 


nex = 


2 






Term 


30586 


29909 


- 


0 


20 


Init 


31167 


30868 


- 


0 




>2392762 


/14724 








len = 


1796 


nex = 


8 




25 














Term 


60877 


60621 




o 




Intr 


61051 


60973 


- 


0 




Intr 


61293 


61140 




o 




Intr 


61514 


61420 


- 


0 


30 


Intr 


61620 


61585 




o 




Intr 


61952 


61727 


- 


0 




Intr 


62107 


62037 


_ 


0 




Init 


62416 


62342 


- 


0 


35 


>2392762 


/15990 








len = 


1729 


nex = 


8 






Term 


60877 


60688 




o 


40 


Intr 


61051 


60973 


- 


0 




Intr 


61293 


61140 




o 




Intr 


61514 


61420 


- 


0 




Intr 


61620 


61585 




o 




Intr 


61952 


61727 


- 


0 


45 


Intr 


62107 


62037 


- 


0 




Init 


62416 


62342 


- 


0 




>2392762 


/41162 






50 


len = 


951 


nex = 


3 






Init 


68249 


68350 


+ 


0 




Intr 


68449 


68513 


+ 


0 




Term 


68901 


69199 


+ 


0 


55 














>2435510 


/32833 








len = 


1450 


nex = 


5 





60 



Term 



41015 40654 



0 



Reference No. 2750-942P 



Intr 
Intr 
Intr 
Init 

>2435510 

len = 

Term 
Intr 
Init 

>2435510 

len = 

Init 
Intr 
Term 

>2435510 

len = 

Init 
Intr 
Term 

>2435510 

len = 



>2435510 



len = 



>2443899 



len = 



41265 
41451 
41718 
42097 



41098 
41368 
41540 
41892 



/1011 

1120 nex = 

51801 51490 
52028 51949 
52609 52122 

/19362 

1041 nex = 

61031 61254 
61359 61535 
61610 62071 

/142314 

919 nex = 

61151 61254 
61359 61535 
61610 62069 

/33456 

2142 nex = 



Term 


4364 


4051 


Intr 


4676 


4612 


Intr 


5214 


5151 


Intr 


5423 


5314 


Intr 


5600 


5513 


Init 


6192 


5794 



74367 



2039 



Init 


76018 


76119 


Intr 


76377 


76574 


Intr 


76648 


76707 


Intr 


76793 


77235 


Intr 


77335 


77501 


Intr 


77587 


77660 


Intr 


77749 


77808 


Term 


77912 


78053 



/22008 
1489 nex = 



Term 102074 101797 
Init 103282 102296 



Reference No. 2750-942P 



988 

>2443899 /1734 

len = 888 nex = 2 

5 Term 14747 14318 - 0 

Init 15205 14965 - 0 

>2459406 /42992 

10 len = 2396 nex = 10 

Term 117911 117825 - 0 

Intr 118071 117986 - 0 

Intr 118340 118166 - 0 

15 Intr 118518 118458 - 0 

Intr 118661 118595 - 0 

Intr 118838 118754 - 0 

Intr 119077 118920 - 0 

Intr 119310 119166 - 0 

20 Intr 119486 119427 - 0 

Init 119855 119575 - 0 

>2459406 /11254 

25 len = 2035 nex = 6 

Init 128392 128598 + 0 

Intr 128894 129063 + 0 

Intr 129142 129327 + 0 

30 Intr 129412 129577 + 0 

Intr 129681 129870 + 0 

Term 130089 130426 + 0 

>2459406 /92741 

35 

len = 538 nex = 1 

Sngl 141230 140693 - 0 

40 >2459406 /13741 

len = 1713 nex = 4 

Term 18475 18146 - 0 

45 Intr 18628 18567 - 0 

Intr 19123 18713 - 0 

Init 19858 19394 - 0 

>2459406 /25272 

50 

len = 1750 nex = 4 

Init 2679 2985 + 0 

Intr 3377 3419 + 0 

55 Intr 3511 3571 + 0 

Term 3697 4419 + 0 

>2459406 /35273 

60 len - 2218 nex = 3 



Reference No. 2750-942P 



989 



10 



15 



20 



25 



30 



Term 
Intr 
Init 

>2459406 

len = 

Term 
Intr 
Intr 
Init 

>2459406 

len = 

Sngl 

>2459406 

len = 

Sngl 

>2459406 

len = 



26889 26777 
28208 27837 
28994 28459 

/28563 

1150 



47656 
47792 
48158 
48577 



nex = 

47428 
47751 
47874 
48488 



/119409 
4 68 nex = 
57470 57023 
/116034 
337 nex = 
61222 61558 
/8717 
2113 nex = 





Init 


66546 


66940 


+ 




Intr 


67084 


67181 


+ 




Intr 


67274 


67339 


+ 




Term 


68443 


68658 


+ 


35 












>2459406 


/31633 






len = 


945 


nex = 


2 


40 


Init 


77435 


77674 


+ 




Term 


78004 


78379 


+ 




>2459406 


/19302 




45 


len = 


2115 


nex = 


6 




Term 


80490 


80306 






Intr 


80717 


80586 






Intr 


80949 


80814 




50 


Intr 


81174 


81044 






Intr 


81479 


81424 






Init 


82420 


82270 






>2459406 


/37919 




55 












len — 


2274 


nex = 


6 




Term 


80490 


80262 






Intr 


80717 


80586 




60 


Intr 


80949 


80814 





0 
0 
0 
0 



0 
0 
0 
0 
0 
0 



0 
0 
0 



Reference No. 2750-942P 



990 





Intr 


81174 


81044 


- 


0 




Intr 


81479 


81424 


— 


0 




Init 


82535 


82270 




0 


5 


>2459406 


/18894 








len = 


235 


nex = 


1 






Sngl 


85070 


85304 . 




0 


1 0 














>2477521 


/15308 








len = 


1434 


nex — 


1 




15 


Sngl 


11192 


12625 


+ 


0 




>2477521 


/27205 








len = 


760 


nex = 


3 




z u 














Term 


22663 


22447 


- 


0 




Intr 


22864 


22743 


- 


0 




Init 


23206 


22955 


- 


0 


25 


>2477521 


/40049 








len = 


3210 


nex = 


5 






Init 


52491 


52536 


+ 


0 


^ n 
J5 u 




52618 


52732 


+ 


0 




Intr 


52824 


52891 


+ 


0 




Intr 


52986 


53708 


+ 


0 




Term 


53792 


54336 


+ 


0 




>Z 4 / / jZI 


73549 








len - 


1750 


nex = 


4 






Init 


59783 


60056 




o 


40 


Intr 


60329 


60677 


+ 


0 




Intr 


60773 


60914 


+ 


0 




Term 


60979 


61527 


+ 


0 




>2477521 


/12293 






45 














len = 


1796 


nex = 








Term 


71123 


70636 




0 




Intr 


71380 


71205 




0 


50 


Intr 


71502 


71478 




0 




Intr 


71702 


71620 




0 




Intr 


72024 


71951 




0 




Init 


72431 


72108 




0 


55 


>2477521 


/98850 







len = 



4463 



Init 74583 74814 
60 Intr 77407 77441 



+ 



0 
0 



Reference No. 2750-942P 



Intr 


77553 


77614 


Intr 


77696 


77795 


Intr 


77904 


77945 


Intr 


78281 


78322 


Term 


78695 


79045 


>2477521 


/92459 


len = 


4460 


nex - 


Init 


74588 


74814 


Intr 


77285 


77342 


Intr 


77553 


77614 


Tnhr 

XIII L 


77696 


77795 


Intr 


77904 


77945 


Intr 


78281 


78322 


Term 


78695 


79047 


>2477521 


/5076 


len = 


730 


nex = 


Term 


79591 


79372 


Intr 


79924 


79697 


Init 


80096 


80042 


>2477521 


74033 


len — 


1930 


nex = 


Init 


94403 


94493 


Intr 


94625 


94761 


Intr 


94865 


94911 


Intr 


94999 


95483 


Intr 


95570 


95727 


Intr 


95814 


95975 


Term 


96051 


96327 



>2494106 

len = 

Term 
Intr 
Init 

>2494106 

len = 

Sngl 

>2494106 

len = 

Sngl 



/36412 

1375 nex = 

99606 98923 
100124 99692 
100297 100214 

/11408 

644 nex = 

109531 110174 

/8951 

910 nex = 

112974 112773 



>2494106 



737020 



Reference No. 2750-942P 



992 





len — 


757 


nex = 


4 






Term 


122980 


122712 




o 




Intr 


123133 


123078 


- 


0 


5 


Intr 


123278 


123220 


— 


0 




Init 


123468 


123370 




o 




>^4y4XUo 


729872 






1 U 


len — 


861 


nex = 


2 






Term 


122980 


122712 


- 


0 




Init 


123133 


123078 


- 


0 


1 

1 O 


>Z 4 y 4 JL U b 


734434 








len - 


866 


nex = 


4 






Term 


122980 


122714 




o 


20 


Intr 


123133 


123078 


- 


0 




Intr 


123278 


123220 


- 


0 




Init 


123577 


123370 


- 


0 




>z 4 y 4 1 U b 


734374 






25 














len = 


359 


nex — 


o 






Term 


123278 


123219 


- 


0 




Init 


123577 


123370 


- 


0 


30 














>2494106 


75465 








len = 


2050 


nex = 


7 




35 


Init 


132597 


132734 


+ 


0 




Intr 


133129 


133207 




o 




Intr 


133336 


133389 


+ 


0 




Intr 


133680 


133793 




o 




Intr 


134040 


134107 


+ 


0 


40 


Intr 


134190 


134301 


+ 


0 




Term 


134381 


134640 


+ 


0 




>2494106 


71520 








len = 


1810 


nex = 


a 
u 






Init 


132677 


133207 


+ 


0 




Intr 


133336 


133389 


+ 


0 




Intr 


133680 


133793 


+ 


o 


50 


Intr 


134040 


134107 


+ 


0 




Intr 


134190 


134301 


+ 


0 




Term 


134381 


134477 


+ 


0 




>2494106 


72681 






55 














len = 


910 


nex = 


1 






Sngl 


143514 


143911 


+ 


0 



60 >2494106 



733770 



Reference No. 2750-942P 



len = 

Term 
5 Intr 
Intr 
Init 

>2494106 

10 

len = 

Term 
Intr 

15 Intr 
Init 

>2494106 

20 len = 

Init 
Intr 
Term 

25 

>2494106 

len = 

30 Init 
Term 

>2494106 

35 len = 

Term 
Intr 
Intr 

40 Intr 
Intr 
Intr 
Intr 
Intr 

45 Intr 
Intr 
Intr 
Init 

50 >2494106 
len = 
Sngl 

55 

>2494106 
len = 



1302 nex - 

158712 158351 

159059 158976 

159236 159156 

159509 159332 

/27457 

14 62 nex = 

40898 40547 

41137 41003 

41443 41231 

42008 41526 

/25255 

1719 nex = 

54004 54063 

54151 54486 

54639 54877 

/14939 

610 nex = 

54277 54486 

54639 54879 

/32130 

3130 nex = 

56042 55686 

56181 56114 

56328 56265 

56502 56421 

56676 56618 

56984 56925 

57266 57104 

57498 57374 

57857 57795 

58060 58001 

58325 58140 

58811 58689 

/6667 

1554 nex = 

60644 59091 

/25894 

1630 nex = 



60 



Term 



64139 63599 



Reference No. 2750-942P 



994 





Intr 


64439 


64381 




0 




Intr 


64965 


64855 


- 


0 




Init 


65226 


65138 


- 


0 


5 


>2494110 


/23300 








len = 


2036 


nex = 


5 






Term 


17775 


17469 


- 


0 


i n 


intr 


18041 


17877 




o 




Intr 


18302 


18159 


- 


o. 




Intr 


18618 


18423 


— 


0 




init 


19504 


19053 




o 


J. o 


soyi i i n 

> Z.H )?H 1 -L U 


78559 








len = 


1302 


nex = 


2 






Init 


25200 


25402 


+ 


0 


20 


Term 


26210 


26501 


+ 


0 




>2494110 


737952 








len = 


4214 


nex = 






25 














mit 


25200 


25402 


+ 


o 




Intr 


26210 


26290 


+ 


0 




Intr 


27617 


28259 




o 




Intr 


28358 


28461 


+ 


0 


30 


Intr 


28571 


28709 


+ 


0 




Term 


28803 


29413 


+ 


0 




>2494110 


721100 






35 


len - 


812 


nex = 


3 






Term 


30699 


30410 




0 




Intr 


30921 


30796 




0 




Init 


31221 


30993 




0 



40 



45 



50 



55 



60 



>2494110 

len = 

Term 
Intr 
Init 

>2494110 

len = 

Sngl 

>2494110 

len = 

Sngl 



/34753 

807 nex = 

30699 30415 
30921 30796 
31221 30993 

/110726 

4 93 nex = 

32194 32672 

/2265 

494 nex = 

38819 39312 



Reference No, 2750-942P 



995 



>2494110 
len = 
Sngl 

>2494110 
len = 



/13232 
1220 nex = 
40544 39752 
/31923 
1284 nex = 



Init 


41985 


42310 


+ 


0 


Intr 


42859 


42930 


+ 


0 


Term 


43017 


43268 


+ 


0 


ft .7 *i X X U 


/100984 






len — 


1340NO 


match - 


No prediction 


>2494110 


/27110 






len = 


108 


nex = 


1 




Sngl 


74373 


74480 


+ 


o 


>2494110 


/40608 






len = 


1703 


nex = 


5 




Term 


91321 


90966 


- 


0 


xnT, r 


91466 


91405 




o 


Intr 


91657 


91540 


- 


0 


Intr 


92025 


91739 




0 


Init 


92668 


92298 




0 


>^ 4 y 4 1 1 U 


/2935 






len — 


1613 


nex = 


5 




Init 


97175 


97627 


+ 


0 


Intr 


97725 


97897 




0 


Intr 


97974 


98088 


+ 


0 


Intr 


98324 


98478 


+ 


0 


Term 


98578 


98787 


+ 


0 


>2505864 


735333 






len = 


1426 


nex = 


4 




Init 


20951 


21020 


+ 


0 


Intr 


21255 


21415 


+ 


0 


Intr 


21681 


21869 


+ 


0 


Term 


22136 


22367 


+ 


0 


>2505864 


/4328 






len = 


1374 


nex — 


4 





Init 
Intr 
Intr 
Term 



21013 
21255 
21681 
22136 



21070 
21415 
21869 
22386 



+ 
+ 
+ 
+ 



Reference No. 2750-942P 



>2505873 



/6115 



len = 


1600 nex = 


Term 


14088 13696 


Init 




>ZjUjo / o 


/ r> o o c 9 

/ J JO 


len — 


134 nex — 


Sngl 


1 y yzz zUUDj 


>2505873 


/36699 


len — 


236 nex — 


Sngl 


27483 27718 


>2529657 


/32457 


len = 


1232 nex = 


Term 


10325 10206 


Intr 


10512 10408 


Intr 


10777 10703 


Intr 


11135 11111 


Init 


11437 11243 



>2529657 



/26123 



len — 


1422 


nex = 


Term 


10325 


10041 


Intr 


10512 


10408 


Intr 


10777 


10703 


Intr 


11135 


11111 


Init 


11462 


11380 


>2529657 


/20647 


len = 


1390 


nex = 


Term 


12109 


11630 


Intr 


12283 


12185 


Intr 


12499 


12362 


Intr 


12722 


12592 


Init 


13015 


12840 


>2529657 


/28691 


len = 


2057 


nex = 


Term 


12109 


11913 


Intr 


12283 


12185 


Intr 


12499 


12362 


Intr 


12722 


12592 


Intr 


12986 


12840 


Intr 


13647 


13615 



Reference No. 2750-942P 



997 





Init 


13969 


13735 




0 




>2529657 


733373 






5 


len = 


2492 


nex = 


8 






Term 


12109 


11637 




0 




Intr 


12283 


12185 




0 




Intr 


12499 


12362 




0 


10 


Intr 


12722 


12592 


- 


0 




Intr 


12986 


12840 




0 




Intr 


13647 


13615 


- 


0 




Intr 


13831 


13735 




0 




Init 


14128 


13974 




0 


15 














>2529657 


724272 








len = 


1054 


nex = 


4 




20 


Term 


17370 


17243 




0 




Intr 


17555 


17463 


- 


0 




Intr 


17935 


17637 




0 




Init 


18296 


18094 




0 


25 


>2529657 


76394 








len = 


1870 


nex = 


5 






Term 


17370 


16988 


- 


0 


30 


Intr 


17555 


17463 




0 




Intr 


17935 


17637 


- 


0 




Intr 


18295 


18094 




0 




Init 


18459 


18415 




0 


35 


>2529657 


725729 








len = 


802 


nex = 


1 






Sngl 


3834 


4635 


+ 


0 


40 














>2529657 


737870 








len = 


3805 


nex = 


17 




45 


Term 


46706 


46424 




0 




Intr 


46947 


46882 


— 


0 




Intr 


47087 


47058 


_ 


0 




Intr 


47280 


47182 




0 




Intr 


47466 


47371 




0 


50 


Intr 


47623 


47573 




0 




Intr 


47773 


47707 




0 




Intr 


47950 


47856 




0 




Intr 


48158 


48077 




0 




Intr 


48324 


48275 




0 


55 


Intr 


48463 


48413 




0 




Intr 


48638 


48540 




0 




Intr 


49052 


48969 




0 




Intr 


49302 


49192 




0 




Intr 


49575 


49426 




0 


60 


Intr 


49795 


49678 




0 



Reference No. 2750-942P 



Init 


50050 


49884 


>2529657 


/32039 


len = 


670 


nex = 


Sngl 


63987 


64654 


>2529657 


/9499 


len — 


654 


nex = 


Term 


65131 


64658 


Init 


65297 


65222 


>2529657 


/38461 


len = 


2350 


nex = 


Term 


65131 


64823 


Intr 


65346 


65222 


Intr 


65588 


65432 


Intr 


65777 


65686 


Intr 


65890 


65863 


Intr 


66093 


65976 


Intr 


66394 


66339 


Intr 


66604 


66507 


Intr 


66777 


66693 


Init 


67165 


66986 


>2529657 


/13774 


len = 


2397 


nex = 


Term 


65131 


64823 


Intr 


65346 


65222 


Intr 


65588 


65432 


Intr 


65777 


65686 


Intr 


65890 


65863 


Intr 


66093 


65976 


Intr 


66394 


66339 


Intr 


66604 


66507 


Intr 


66777 


66693 


Init 


67219 


66986 



>2529657 
len = 
Sngl 
>2529657 
len = 
Sngl 
>2564044 



/34914 
717 nex = 
75255 74539 
/37980 
1352 nex = 
75893 74542 
/156017 



len = 



401 nex - 



Reference No. 2750-942P 



999 





Sngl 


12975 12575 




0 




>2564044 


/156773 






5 












len = 


350 nex = 


1 






Sngl 


12997 12648 




0 


10 


>2564044 


/31129 








len = 


430 nex = 


1 






Sngl 


13041 12616 


_ 


0 


15 












>2564044 


/21629 








len = 


1610 nex = 


5 




20 


Term 


36986 36739 




0 




Intr 


37123 37068 


- 


0 




Intr 


37318 37272 


— 


0 




Intr 


37669 37626 


— 


0 




Init 


38348 38232 


— 


0 


25 












>2564044 


/22860 








len - 


3400 nex = 


11 




30 


Init 


5043 5315 


+ 


0 




Intr 


5670 5734 


+ 


0 




Intr 


5871 5969 


+ 


0 




Intr 


6171 6303 


+ 


0 




Intr 


6748 6807 


+ 


0 


35 


Intr 


6897 7019 


+ 


0 




Intr 


7379 7450 


+ 


0 




Intr 


7562 7699 


+ 


0 




Intr 


7786 7941 


+ 


0 




Intr 


8028 8132 




0 


40 


Term 


8282 8442 




0 




>2564045 


/108335 








len = 


1516 nex = 


2 




45 












Term 


653 118 


- 


0 




Init 


1633 770 




0 




>2564045 


/512 






50 












len = 


1435 nex = 


1 






Sngl 


40196 39668 




0 


55 


>2564045 


/40250 








len = 


1210 nex = 


2 





60 



Term 
Init 



57008 
57441 



56234 
57096 



0 
0 



Reference No. 2750-942P 



>2564045 /36090 





len = 


1219 


nex = 


2 


5 












Term 


57008 


56234 


- 




Init 


57452 


57096 






>2564045 


733763 




10 












len = 


1217 


nex = 


1 




Sngl 


5886 


7102 


+ 


15 


>2564045 


723566 






len = 


1043 


nex = 


3 




Init 


9042 


9192 


+ 


20 


Intr 


9618 


9763 


+ 




Term 


9851 


10084 


+ 




>2564046 


74272 




25 


len = 


4185 


nex = 


11 




Term 


18249 


17894 






Intr 


18506 


18454 


- 




Intr 


18683 


18598 


— 


30 


Intr 


18985 


18867 






Intr 


19502 


19431 






Intr 


19881 


19708 


- 




Intr 


20444 


20289 






Intr 


20917 


20836 


— 


35 


Intr 


21276 


21130 






Intr 


21654 


21468 






Init 


22078 


21842 






>2564046 


713993 




40 












len = 


1672 


nex = 


6 




Init 


27089 


27339 






Intr 


27573 


27725 


+ 


45 


Intr 


27820 


27972 


+ 




Intr 


28179 


28262 


+ 




Intr 


28344 


28485 


+ 




Term 


28581 


28760 


+ 


50 


>2564046 


735683 






len = 


697 


nex = 


3 




Term 


34417 


34208 




55 


Intr 


34609 


34504 






Init 


34904 


34742 





1000 



>2564047 /12802 



60 len = 



1648 nex = 



3 



Reference No. 2750-942P 



1001 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Init 
Intr 
Terra 

>2564047 
len = 
Sngl 

>2564047 

len = 

Init 
Intr 
Term 

>2564047 
len = 
Sngl 

>2564047 
len = 



>2564047 
len = 



>2564047 
len — 



16402 16518 
17081 17129 
17663 17714 

/19442 

850 nex = 

21861 22701 

/2533 



1779 



nex 



37480 37886 
37970 38637 
39199 39258 

/32890 

1279 nex = 



51389 



50111 



/13737 



1302 



nex = 



/6893 
1309 nex = 



/114864 



2470 



+ 
+ 
+ 



Term 


57880 


57668 




0 


Intr 


58070 


58011 




0 


Intr 


58297 ^ 


58197 




0 


Intr 


58633 


58398 




0 


Init 


58969 


58725 




0 



Term 


57880 


57662 


0 


Intr 


58070 


58011 


0 


Tntr 


58297 


•58197 


0 


Intr 


58633 


58398 


0 


Init 


58970 


58725 


0 



Init 


59318 


59464 




0 


Intr 


59652 


59723 


+ 


0 


Intr 


59821 


59895 


+ 


0 


Intr 


60508 


60588 


+ 


0 


Intr 


60854 


60923 


+ 


0 


Intr 


60996 


61087 


+ 


0 


Intr 


61178 


61219 


+ 


0 


Intr 


61298 


61378 




0 


Term 


61566 


61785 


+ 


0 



60 >2564047 



/105566 



Reference No. 2750-942P 



1002 



len = 


979 


nex = 


1 


Sngl 


62070 


63048 


+ 


>2564047 


/12455 




len = 


1933 


nex = 


2 


Term 


67046 


66415 


_ 


Init 


68347 


67916 




>2564047 


/40711 




len = 


850 


nex = 


1 


Sngl 


78369 


77529 


- 


>2564048 


/105906 




len - 


1212 


nex = 


2 


Init 


2380 


2769 


+ 


Term 


2946 


3591 


+ 


>2564048 


/115613 




len = 


586 


nex = 


1 


Sngl 


31514 


30929 




>2564048 


/1200 




len = 


1778 


nex 


4 


Term 


38609 


37 8 8 5 


- 


Intr 


38864 


38681 


— 


Intr 


39244 


38988 


— 


Init 


39662 


39331 


— 


>2564048 


739462 




len = 


2351 


nex = 


7 


Init 


41518 


41825 




Intr 


42059 


42268 


+ 


Intr 


42387 


42557 


+ 


Intr 


42766 


42889 


+ 


Intr 


43155 


43216 


+ 


Intr 


43305 


43386 


+ 


Term 


43481 


43868 


+ 


>2564048 


/10292 




len - 


1951 


nex = 


7 


Init 


41667 


41825 


+ 


Intr 


42059 


42268 


+ 


Intr 


42387 


42557 


+ 


Intr 


42766 


42889 


+ 



0 
0 
0 
0 



Reference No. 2750-942P 



1003 



Intr 


43155 


43216 


+ 


Intr 


43305 


43386 


+ 


Term 


43481 


43617 


+ 


>Zbb4 04 o 


726637 




len = 


1980 


nex - 


7 


Init 


61938 


62027 


+ 


Intr 


62306 


62497 


+ 


Intr 


62586 


62757 


+ 


Intr 


62859 


62932 


+ 


Intr 


63011 


63037 


+ 


Intr 


63126 


63149 




Term 


63245 


63657 


+ 


>2564048 


/158431 




len = 


1690 


nex - 


5 


Init 


65258 


65519 


+ 


Intr 


65699 


65751 


+ 


Intr 


65845 


65980 


+ 


Intr 


66115 


66290 


+ 


Term 


66365 


66942 


+ 


>2564049 


737294 




len = 


1427 


nex = 


2 


Term 


485 


294 


— 


Init 


1720 


625 


— 


>256404 9 


7104793 




len = 


1873 


nex = 


7 


Init 


17973 


18128 


+ 


Intr 


18663 


18789 


+ 


Intr 


18882 


19035 


+ 


Intr 


19112 


19208 


+ 


Intr 


19304 


19392 


+ 


Intr 


19521 


19589 


+ 


Term 


19790 


19845 


+ 


>2564049 


7141731 




len = 


2068 


nex = 


7 


Init 


18007 


18128 


+ 


Intr 


18663 


18789 


+ 


Intr 


18882 


19035 


+ 


Intr 


19112 


19208 


+ 


Intr 


19304 


19392 


+ 


Intr 


19521 


19589 


+ 


Term 


19790 


20074 


+ 



0 
0 
0 
0 
0 
0 
0 



0 
0 



0 
0 
0 
0 
0 
0 
0 



>2564049 



721604 



len = 



557 nex = 



1 



Reference No. 2750-942P 



1004 





Sngl 


28618 


28062 




5 


>2564049 


/16144 






len = 


1365 


nex = 


3 




Init 


28919 


29348 


+ 




Intr 


29603 


29695 


+ 


10 


Term 


30029 


30283 


+ 




>2564 04 9 


/31971 






len = 


1818 


nex = 


2 




Init 


35677 


36089 


+ 




Term 


36890 


37494 


+ 


20 


>zd d4 (J4 y 


/13667 






len = 


1704 


nex = 


6 




Term 


5026 


4812 


- 




Intr 


5207 


5118 




25 


Intr 


5466 


5299 


- 




Intr 


5691 


5572 






Intr 


5932 


5787 






Init 


6515 


6354 




JO 


>2564050 


/6203 






len ~ 


2839 


nex = 


13 




Init 


12017 


12391 


+ 




Intr 


12485 


12567 


+ 




Intr 


12820 


12974 






Intr 


13048 


13082 


+ 




Intr 


13144 


13293 


+ 




Intr 


13467 


13562 


+ 


40 


Intr 


13634 


13750 


+ 




Intr 


13832 


13951 


+ 




Intr 


14029 


14121 






Intr 


14202 


14324 


+ 




Intr 


14407 


14523 


+ 


45 


Intr 


14606 


14668 






Term 


14766 


14842 


+ 




>2564050 


/123496 




50 


len = 


674 


nex = 


1 




Sngl 


17696 


18369 


+ 


55 


>2564050 


/16313 






len = 


1594 


nex = 


5 



0 
0 
0 
0 
0 
0 



0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 



Init 2671 2918 
Intr 3227 3325 
60 Intr 3410 3518 



+ 

+ 
+ 



0 
0 
0 



Reference No. 2750-942P 



1005 



Intr 


3687 


3758 


+ 


Term 


3993 


4264 


+ 


>2564050 


/14738 




len = 


1040 


nex = 


2 


Term 


28103 


27853 




Init 


28892 


28606 




>2564050 


/13951 




len = 


1063 


nex = 


2 


Term 


28103 


27848 




Init 


28910 


28606 


- 


>2564050 


/38057 




len = 


2006 


nex = 


0 


>2564051 


77688 




len = 


1722 


nex = 


2 


Term 


13311 


12928 




Init 


13996 


13887 




>2564051 


/6220 




len = 


2570 


nex = 


6 


Init 


18254 


18493 


+ 


Intr 


18575 


18754 


+ 


Intr 


19785 


19904 


+ 


Intr 


19917 


20078 


+ 


Intr 


20178 


20459 


+ 


Term 


20546 


20823 




>2564051 


/30648 




len = 


2334 


nex = 


9 


Init 


33401 


33589 


+ 


Intr 


33676 


33848 


+ 


Intr 


34149 


34268 


+ 


Intr 


34373 


34429 


+ 


Intr 


34595 


34675 


+ 


Intr 


34763 


34797 


+ 


Intr 


34933 


35006 


+ 


Intr 


35103 


35262 




Term 


35380 


35734 


+ 


>2564051 


/30994 




len = 


2530 


nex = 


8 



0 
0 



0 
0 
0 
0 
0 
0 



0 
0 
0 
0 
0 
0 
0 
0 
0 



Init 45513 45608 
Intr 46036 46115 
Intr 46206 46280 



+ 
+ 
+ 



0 
0 
0 



Reference No. 2750-942P 



Intr 


46370 


46473 


Intr 


46561 


46717 


Intr 


46810 


46897 


Intr 


46997 


47069 


Term 


47147 


47224 



1006 
+ 0 
+ 0 
+ 0 
+ 0 
+ 0 



>2564051 



/29619 



10 


len = 


942 


nex = 


3 






Init 


46810 


46897 


+ 


0 




Intr 


46997 


47069 


+ 


0 




Term 


47147 


47224 


+ 


o 


15 


>2564051 


729829 








len = 


1317 


nex = 


3 






Term 


48114 


47710 


— 


0 


20 


Intr 


48493 


48207 




o 




Init 


49026 


48809 


- 


0 




>2564051 


/6519 






25 


len = 


1128 


nex = 


2 






Init 


72721 


72978 


+ 


0 




Term 


73194 


73848 


+ 


0 


30 


>2564051 


/142033 








len — 


651 


nex = 


2 






Init 


72788 


72978 


+ 


0 


35 


Term 


73194 


73438 


+ 


0 




>2564051 


/14159 






40 


len = 


1394 


nex = 


5 






Term 


74311 


74056 




0 




Intr 


74603 


74398 




0 




Intr 


74863 


74713 




0 




Intr 


75172 


74950 




0 


45 


Init 


75449 


75412 




0 




>2564051 


/40866 






50 


len = 


1519 


nex = 


5 






Term 


74311 


74064 




0 




Intr 


74603 


74398 




0 




Intr 


74863 


74713 




0 




Intr 


75172 


74950 




0 


55 


Init 


75582 


75412 




0 



>2564051 



/17770 



len = 

60 



1500 nex = 



5 



Reference No. 2750-942P 



1007 

Term 74311 74086 - 0 

Intr 74603 74398 - 0 

Intr 74863 74713 - 0 

Intr 75172 74950 - 0 

Init 75445 75412 - 0 

>2564051 /13949 

len = 1110 nex = 3 

Term 82879 82476 - 0 

Intr 83240 82973 - 0 

Init 83585 83325 - 0 

>2570223 /40832 

len = 2253 nex - 6 

Init 17162 17477 + 0 

Intr 17799 17892 + 0 

Intr 18430 18609 + 0 

Intr 18688 18807 + 0 

Intr 18887 19020 + 0 

Term 19185 19414 + 0 

>2570223 /37699 

len = 2869 nex = 9 

Term 26477 25979 - 0 

Intr 26840 26580 - 0 

Intr 27159 26941 - 0 

Intr 27498 27271 - 0 

Intr 27878 27776 - 0 

Intr 28077 27965 - 0 

Intr 28258 28197 - 0 

Intr 28478 28346 - 0 

Init 28847 28757 - 0 

>2570223 /23106 

len = 629 nex - 1 

Sngl 74691 75319 + 0 

>2583106 /29207 

len = 2272 nex = 5 

Init 108141 108430 + 0 

Intr 108875 109071 + 0 

Intr 109540 109629 + 0 

Intr 109744 109815 + 0 

Term 110152 110412 + 0 

>2583106 /36389 

len = 643 nex = 1 

Sngl 121587 120945 - 0 



Reference No. 2750-942P 



1008 





>2583106 


/17187 






5 


len = 


677 nex = 


1 






Sngl 


121618 120942 




0 




>2583106 


/23203 






10 


len = 


704 nex = 


2 






Init 


13956 14106 


+ 


0 




Term 


14207 14659 


+ 


0 


15 


>2583106 


/2322 








len = 


531 nex = 


1 




20 


Sngl 
>2583106 


15480 14950 
/26817 




0 




len = 


1955 nex = 


3 




25 


Term 


15998 14950 




0 




Intr 


16202 16119 


- 


0 




Init 


16904 16559 




0 


30 


>2583106 


/7709 








len = 


1471 nex = 


1 






Sngl 


3827 5297 


+ 


0 


35 


>2583106 


/33864 








len - 


1700 nex = 


1 




40 


Sngl 
>2583106 


64128 64474 
/27799 




0 




len = 


1690 nex = 


4 




45 


Term 


73308 72358 


- 


0 




Intr 


73553 73400 


_ 


0 




Intr 


73796 73648 




0 




Init 


74047 73886 




0 


o U 


>2583106 


/15659 








len = 


2848 nex — 


7 






Term 


73308 72358 




0 


55 


Intr 


73553 73400 




0 




Intr 


73796 73648 




0 




Intr 


74168 73886 




0 




Intr 


74356 74251 




0 




Intr 


74536 74446 




0 


60 


Init 


75205 74719 




0 



Reference No. 2750-942P 



1009 

>2583106 /18320 

len = 2568 nex = 7 

5 

Term 73308 72638 - 0 

Intr 73553 73400 - 0 

Intr 73796 73648 - 0 

Intr 74168 73886 - 0 

10 Intr 74356 74251 - 0 

Intr 74536 74446 - 0 

Init 75205 74719 - 0 

>2583106 /21765 

15 

len = 1874 nex = 0 

>2583106 /1969 

20 len - 5689 nex = 1 

Sngl 88355 88684 + 0 

>2583106 /37127 

25 

len = 310 nex = 0 

>2583106 /37621 

30 len = 3101 nex = 15 

Term 89198 88862 - 0 

Intr 89371 89306 - 0 

Intr 89531 89462 - 0 

35 Intr 89689 89616 - 0 

Intr 89891 89793 - 0 

Intr 90037 89976 - 0 

Intr 90178 90137 - 0 

Intr 90316 90265 - 0 

40 Intr 90541 90442 - 0 

Intr 90682 90638 - 0 

Intr 90843 90796 - 0 

Intr 91179 91104 - 0 

Intr 91456 91286 - 0 

45 Intr 91590 91540 - 0 

Init 91962 91806 - 0 

>2584827 /273 

50 len = 2260 nex - 5 

Term 101872 101586 - 0 

Intr 102093 102017 - 0 

Intr 102388 102242 - 0 

55 Intr 102650 102480 - 0 

Init 103150 102928 - 0 

>2584827 /5480 

60 len = 1994 nex = 3 



Reference No. 2750-942P 



1010 





Term 


111925 111310 




0 




Intr 


112167 112049 




0 


5 


Init 
>2584827 


113303 112263 
/5171 




0 




len = 


319 nex = 


1 




10 


Sngl 
>2584827 


115114 114796 
/17426 




0 


15 


len = 


597 nex - 


1 






Sngl 


115422 114826 




0 




>2584827 


/11593 






20 


len = 


562 nex = 


1 






Sngl 


115422 114861 




0 


25 


>2584827 


/25571 








len = 


610 nex = 


1 






Sngl 


115430 114821 


— 


0 


30 


>2584827 
len = 


/34348 
1756 nex = 


8 






Term 


117147 116843 


- 


0 


35 


Intr 


117385 117233 




0 




Intr 


117590 117483 


- 


0 




Intr 


117734 117687 




0 




Intr 


118025 117813 


- 


0 




Intr 


118181 118117 




0 


4 0 


Intr 


118386 118262 




0 




Init 


118595 118482 




0 




>2584827 


/39107 






45 


len = 


2383 nex = 


7 






Term 


117147 116840 




0 




Intr 


117385 117233 


- 


0 




Intr 


117590 117483 




0 


50 


Intr 


117734 117687 


- 


0 




Intr 


118025 117813 




0 




Intr 


118181 118117 




0 




Init 


118386 118262 




0 


55 


>2584827 


/5712 








len = 


790 nex = 


1 






Sngl 


23900 24682 


+ 


0 



60 



Reference No. 2750-942P 



1011 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



>2584827 

len = 

Sngl 

>2584827 

len = 

Sngl 

>2584827 

len — 

Sngl 

>2584827 

len = 

Sngl 

>2584827 

len = 

Term 
Init 

>2584827 

len = 

Term 
Init 

>2584827 

len = 

Term 
Init 

>2584827 

len = 

Term 
Intr 
Init 

>2584827 

len - 

Term 
Init 



/27675 

745 nex = 

23978 24722 

/116395 

28 6 nex = 

29868 29583 

/4503 

471 nex = 

30113 29649 

/22292 

592 nex = 

30233 29642 

/25064 

683 nex = 

30138 29660 
30342 30322 

/25142 

816 nex = 

30138 29596 
30411 30322 

/1994 

835 nex = 

30138 29584 
30418 30322 

/4479 

655 nex = 

84398 84092 
84584 84486 
84746 84674 

/31676 

649 nex = 

85483 85211 
85859 85582 



0 
0 



Reference No. 2750-942P 



1012 

>2584827 /32472 

len = 1762 nex - 7 



Term 


84398 


84113 




0 


Intr 


84584 


84486 




0 


Intr 


84807 


84674 




0 


Intr 


84997 


84910 




0 


Intr 


85301 


85255 




0 


Intr 


85483 


85417 




0 


Init 


85870 


85582 




0 



>2584827 /8972 

len = 4044 nex = 12 



Init 


95183 


95243 


+ 


0 


Intr 


95429 


95523 


+ 


0 


Intr 


95608 


95720 


+ 


0 


Intr 


95804 


95972 


+ 


0 


Intr 


96059 


96098 


+ 


0 


Intr 


96231 


96295 




0 


Intr 


96387 


96500 


+ 


0 


Intr 


96601 


96665 


+ 


0 


Intr 


96783 


96939 


+ 


0 


Intr 


97037 


97156 


+ 


0 


Intr 


97247 


97335 


+ 


0 


Term 


97422 


97750 


+ 


0 



>2584827 /17473 



len = 1881 nex = 9 



Init 


95871 


95972 




0 


Intr 


96059 


96098 


+ 


0 


Intr 


96231 


96295 


+ 


0 


Intr 


96387 


96500 


+ 


0 


Intr 


96601 


96665 


+ 


0 


Intr 


96783 


96939 


+ 


0 


Intr 


97037 


97156 


+ 


0 


Intr 


97247 


97335 


+ 


0 


Term 


97422 


97751 




0 



>2618599 723293 



len = 776 nex = 1 

Sngl 11508 11343 - o 

>2618599 /6500 

len = 997 nex = 1 

Sngl 13043 14039 + 0 

>2618599 /40212 

len = 1750 nex - 3 

Term 12611 12066 - 0 



Reference No. 2750-942P 



1013 





inn it 


13271 


13086 






lnit 


13808 


13354 






>2618599 


/39514 




5 












len = 


1757 


nex = 


4 




Term 


23461 


23198 






Intr 


24300 


24198 


_ 


1 0 


T n i- v 


24625 


24565 








24954 


24707 








/8490 




15 


len = 


1570 


nex = 


5 




Term 


27344 


27150 


- 




Intr 


27528 


27425 






Intr 


27900 


27615 


- 




T — X. 

x nLi 


28171 


27989 






± m r, 


28711 


28527 








/96 






25 


len = 


1631 


nex = 


5 




Term 


27344 


27085 


- 




Intr 


27528 


27425 






Intr 


27900 


27615 


- 


30 


Intr 


28171 


27989 






lnit 


28715 


28527 


- 






/96124 




35 


len - 


490 


nex = 


1 




bng J_ 


61246 


60764 






>2618599 


/13096 




40 












len = 


1229 


nex = 


3 




lnit 


71559 


71780 


+ 




Intr 


72273 


72395 


+ 


A ^ 


Term 
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